History log of /u-boot/arch/riscv/lib/memcpy.S
Revision Date Author Comments
# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 3c60e59a 03-Jan-2023 Rick Chen <rick@andestech.com>

riscv: memcpy: check src and dst before copy

Add src and dst address checking, if they
are the same address, just return and don't
copy data anymore.

Signed-off-by: Rick Chen <rick@andestech.com>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 703b84ec 13-May-2021 Bin Meng <bmeng.cn@gmail.com>

riscv: Fix memmove and optimise memcpy when misalign

At present U-Boot SPL fails to boot on SiFive Unleashed board, due
to a load address misaligned exception happens when loading the FIT
image in spl_load_simple_fit(). The exception happens in memmove()
which is called by fdt_splice_().

Commit 8f0dc4cfd106 introduces an assembly version of memmove but
it does take misalignment into account (it checks if length is a
multiple of machine word size but pointers need also be aligned).
As a result it will generate misaligned load/store for the majority
of cases and causes significant performance regression on hardware
that traps misaligned load/store and emulate them using firmware.

The current behaviour of memcpy is that it checks if both src and
dest pointers are co-aligned (aka congruent modular SZ_REG). If
aligned, it will copy data word-by-word after first aligning
pointers to word boundary. If src and dst are not co-aligned,
however, byte-wise copy will be performed.

This patch was taken from the Linux kernel patch [1], which has not
been applied at the time being. It fixes the memmove and optimises
memcpy for misaligned cases. It will first align destination pointer
to word-boundary regardless whether src and dest are co-aligned or
not. If they indeed are, then wordwise copy is performed. If they
are not co-aligned, then it will load two adjacent words from src
and use shifts to assemble a full machine word. Some additional
assembly level micro-optimisation is also performed to ensure more
instructions can be compressed (e.g. prefer a0 to t6).

With this patch, U-Boot boots again on SiFive Unleashed board.

[1] https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

Fixes: 8f0dc4cfd106 ("riscv: assembler versions of memcpy, memmove, memset")
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>


# 8f0dc4cf 26-Mar-2021 Heinrich Schuchardt <xypron.glpk@gmx.de>

riscv: assembler versions of memcpy, memmove, memset

Provide optimized versions of memcpy(), memmove(), memset() copied from
the Linux kernel.

Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Reviewed-by: Leo Yu-Chi Liang <ycliang@andestech.com>