Lines Matching refs:and

5  * Common Development and Distribution License, Version 1.0 only
12 * and limitations under the License.
15 * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
52 * ! of copy and flags. Set up error handling accordingly.
53 * ! The transition point depends on whether the src and
58 * ! For FP version, %l6 holds previous error handling and
61 * ! So either %l6 or %o4 is reserved and not available for
110 * restore error handler and exit.
119 * restore error handler and exit.
123 * ! method: line up src and dst as best possible, then
210 * We've tried to restore fp state from the stack and failed. To
220 * saving a register save and restore. Also, less elaborate setup
222 * For longer copies, especially unaligned ones (where the src and
229 * moved whether the FP registers need to be saved, and some other
231 * 400 clocks. Since each non-repeated/predicted tst and branch costs
233 * longer copies and only benefit a small portion of medium sized
248 * is more data and that data is not in cache, failing to prefetch
251 * The exact tradeoff is strongly load and application dependent, with
262 * hw_copy_limit_1 = src and dst are byte aligned but not halfword aligned
263 * hw_copy_limit_2 = src and dst are halfword aligned but not word aligned
264 * hw_copy_limit_4 = src and dst are word aligned but not longword aligned
265 * hw_copy_limit_8 = src and dst are longword aligned
267 * To say that src and dst are word aligned means that after
269 * both the src and dst will be on word boundaries so that
270 * word loads and stores may be used.
273 * on Cheetah+ (900MHz), Cheetah++ (1200MHz), and Jaguar(1050MHz):
282 * If hw_copy_limit_? is set to a value between 1 and VIS_COPY_THRESHOLD (256)
285 * It is provided to allow for disabling FPBLK copies and to allow
292 * saves an alignment test, memory reference, and enabling test
296 * non-predicted tst and branch costs around 10 clocks.
297 * If src and dst are randomly selected addresses,
302 * But, tests on running kernels show that src and dst to copy code
303 * are typically not on random alignments. Structure copies and
314 * We subdivide the non-FPBLK case further into CHKSIZE bytes and less
316 * align src and dst. We try to minimize special case tests in
321 * src and dst alignment and provide special cases for each of
323 * to decide between short and medium size was chosen to be 39
325 * shift and 4 times 8 bytes for the first long word unrolling.
334 * branch instruction on Cheetah, Jaguar, and Panther, the
340 * and nops which are not executed in the code. This
345 * instruction and the unrolled loops, then the alignment needs
350 * a non-predicted tst and branch takes 10 clocks, this savings
362 * three iterations later and shows a measured improvement
371 * Notes on preserving existing fp state and on membars.
375 * preserve - the rest of the kernel does not use fp and, anyway, fp
378 * - userland has fp state and is interrupted (device interrupt
379 * or trap) and within the interrupt/trap handling we use
384 * userland or in kernel copy) and the tl0 component of the handling
386 * - a user process with fp state incurs a copy-on-write fault and
390 * using our stack is ideal (and since fp copy cannot be leaf optimized
395 * nops (those semantics always apply) and #StoreLoad is implemented
418 * ourselves and it is our cpu which will take any trap.
428 * and reboot the system (or restart the service with Greenline/Contracts).
432 * the event and the trap PC may not be the PC of the faulting access.
438 * is no need to repeat this), and we must force delivery of deferred
444 * Since the copy operations may preserve and later restore floating
449 * To make sure that floating point state is always saved and restored
455 * use. Bit 2 (TRAMP_FLAG) indicates that the call was to bcopy, and a
495 * Entry points bcopy, copyin_noerr, and copyout_noerr use this flag.
496 * kcopy, copyout, xcopyout, copyin, and xcopyin do not set this flag.
504 * Testing with 1200 MHz Cheetah+ and Jaguar gives best results with
505 * two prefetches, one with a reach of 8*BLOCK_SIZE+8 and one with a
508 * for the improvement is that with Cheetah and Jaguar, some prefetches
522 * floating-point register save area and 2 64-bit temp locations.
556 * Copy functions use either quadrants 1 and 3 or 2 and 4.
558 * FZEROQ1Q3: Zero quadrants 1 and 3, ie %f0 - %f15 and %f32 - %f47
559 * FZEROQ2Q4: Zero quadrants 2 and 4, ie %f16 - %f31 and %f48 - %f63
601 * Macros to save and restore quadrants 1 and 3 or 2 and 4 to/from the stack.
602 * Used to save and restore in-use fp registers when we want to use FP
603 * and find fp already in use and copy size still large enough to justify
604 * the additional overhead of this save and restore.
618 * original data, and a membar #Sync after restore lets the block loads
622 * and before using the BLD_*_FROMSTACK macro.
628 and tmp1, -VIS_BLOCKSIZE, tmp1 /* block align */ ;\
637 and tmp1, -VIS_BLOCKSIZE, tmp1 /* block align */ ;\
646 and tmp1, -VIS_BLOCKSIZE, tmp1 /* block align */ ;\
655 and tmp1, -VIS_BLOCKSIZE, tmp1 /* block align */ ;\
663 * FP_NOMIGRATE and FP_ALLOWMIGRATE. Prevent migration (or, stronger,
665 * switch) before commencing a FP copy, and reallow it on completion or
674 * CPU we perform the copy on and so that we know which CPU failed
676 * This could be achieved through disabling preemption (and we have do it that
827 * Errno value is in %g1. bcopy_more uses fp quadrants 1 and 3.
835 and %l6, TRAMP_FLAG, %l0 ! copy trampoline flag to %l0
856 ! and bcopy. kcopy will *always* set a t_lofault handler
858 ! and *not* to invoke any existing error handler. As far as
925 * Assumes double word alignment and a count >= 256.
1138 ! Now long word aligned and have at least 32 bytes to move
1188 ! Now word aligned and have at least 36 bytes to move
1235 ! Now half word aligned and have at least 38 bytes to move
1265 * profiling and dtrace of the portions of the copy code that uses
1285 ! kcopy and bcopy use the same code path. If TRAMP_FLAG is set
1286 ! and the saved lofault was zero, we won't reset lofault on
1523 subcc %o0, %o1, %o3 ! difference of from and to address
1530 2: cmp %o2, %o3 ! cmp size and abs(from - to)
1533 cmp %o0, %o1 ! compare from and to addresses
1570 * has already disabled kernel preemption and has checked
1710 * Transfer data to and from user space -
1715 * Note that copyin(9F) and copyout(9F) are part of the
1720 * So there's two extremely similar routines - xcopyin() and xcopyout()
1726 * There are also stub routines for xcopyout_little and xcopyin_little,
1737 * The only difference between copy{in,out} and
1757 * data copying algorithm and the default limits.
2055 ! Now long word aligned and have at least 32 bytes to move
2113 ! Now word aligned and have at least 36 bytes to move
2166 ! Now half word aligned and have at least 38 bytes to move
2220 * profiling and dtrace of the portions of the copy code that uses
2850 ! Now long word aligned and have at least 32 bytes to move
2905 ! Now word aligned and have at least 36 bytes to move
2957 ! Now half word aligned and have at least 38 bytes to move
3008 * profiling and dtrace of the portions of the copy code that uses
3613 * and returns 1. Otherwise 0 is returned indicating success.
3614 * Caller is responsible for ensuring use_hw_bzero is true and that
3640 ! ... and must be 256 bytes or more
3645 ! ... and length must be a multiple of VIS_BLOCKSIZE
3665 and %l1, -VIS_BLOCKSIZE, %l1