Lines Matching refs:to

19  * however the loop has been unrolled to enable better memory throughput,
21 * and __memset16 to permit better scheduling to eliminate the stalling
23 * A future enhancement might be to put in a byte store loop for really
26 * WARNING: Maintaining this is going to be more work than the above version,
27 * as fixes will need to be made in multiple places. The performance gain
47 * Serious stalling happens. The only way to mitigate this is to
48 * undertake a major re-write to interleave the constant materialization
58 addq $18,$16,$6 # E : max address to write to
99 * We are now guaranteed to be quad aligned, with at least
100 * one partial quad to write.
103 sra $18,3,$3 # U : Number of remaining quads to write
104 and $18,7,$18 # E : Number of trailing bytes to write
109 * it's worth the effort to unroll this and use wh64 if possible
114 * $6 The max quadword address to write to
116 * $3 Number quads to write
120 subq $3, 16, $4 # E : Only try to unroll if > 128 bytes
126 * through unrolled loop. Do a quad at a time to get us 0mod64
148 * $3 - number quads left to go
150 * $17 - mask of stuff to store
154 * Assumes the wh64 needs to be for 2 trips through the loop in the future
213 stq $1,0($5) # L : And back to memory
236 * This is the original body of code, prior to replication and
237 * rescheduling. Leave it here, as there may be calls to this
246 addq $18,$16,$6 # E : max address to write to
277 * We are now guaranteed to be quad aligned, with at least
278 * one partial quad to write.
281 sra $18,3,$3 # U : Number of remaining quads to write
282 and $18,7,$18 # E : Number of trailing bytes to write
287 * it's worth the effort to unroll this and use wh64 if possible
292 * $6 The max quadword address to write to
294 * $3 Number quads to write
298 subq $3, 16, $4 # E : Only try to unroll if > 128 bytes
304 * through unrolled loop. Do a quad at a time to get us 0mod64
326 * $3 - number quads left to go
328 * $17 - mask of stuff to store
332 * Assumes the wh64 needs to be for 2 trips through the loop in the future
391 stq $1,0($5) # L : And back to memory
415 * to mask stalls. Note that entry point names also had to change
427 addq $18,$16,$6 # E : max address to write to
465 * We are now guaranteed to be quad aligned, with at least
466 * one partial quad to write.
469 sra $18,3,$3 # U : Number of remaining quads to write
470 and $18,7,$18 # E : Number of trailing bytes to write
475 * it's worth the effort to unroll this and use wh64 if possible
480 * $6 The max quadword address to write to
482 * $3 Number quads to write
486 subq $3, 16, $4 # E : Only try to unroll if > 128 bytes
492 * through unrolled loop. Do a quad at a time to get us 0mod64
514 * $3 - number quads left to go
516 * $17 - mask of stuff to store
520 * Assumes the wh64 needs to be for 2 trips through the loop in the future
579 stq $1,0($5) # L : And back to memory