| 4 | 1000 | 42.60 | 42.61 | 42.61 | 
| 8 | 1000 | 45.14 | 45.14 | 45.14 | 
| 16 | 1000 | 42.83 | 42.84 | 42.84 | 
| 32 | 1000 | 46.46 | 46.46 | 46.46 | 
| 64 | 1000 | 44.62 | 44.63 | 44.62 | 
| 128 | 1000 | 58.27 | 58.29 | 58.28 | 
| 256 | 1000 | 61.13 | 61.15 | 61.15 | 
| 512 | 1000 | 70.58 | 70.60 | 70.60 | 
| 1024 | 1000 | 81.64 | 81.66 | 81.65 | 
| 2048 | 1000 | 113.24 | 113.29 | 113.27 | 
| 4096 | 1000 | 158.73 | 158.80 | 158.78 | 
| 8192 | 1000 | 296.67 | 296.83 | 296.78 | 
| 16384 | 1000 | 534.17 | 534.48 | 534.39 | 
| 32768 | 1000 | 925.54 | 926.11 | 925.76 | 
| 65536 | 640 | 1643.30 | 1644.20 | 1643.76 | 
| 131072 | 320 | 1211.07 | 1211.61 | 1211.35 | 
| 262144 | 160 | 2377.06 | 2379.35 | 2378.28 | 
| 524288 | 80 | 9937.20 | 9945.09 | 9941.42 | 
| 1048576 | 40 | 14141.08 | 14171.55 | 14157.80 | 
| 2097152 | 20 | 23278.50 | 23407.60 | 23348.72 | 
| 4194304 | 10 | 41601.71 | 42125.80 | 41887.28 | 
Stream
This is sample output of a Stream test on a cluster of 349 nodes.
| Running Memory Benchmark | 
 | 
 | 
 | |
| node1:  | 
 | 
 | 
 | |
| node1: This system uses 8 bytes per DOUBLE PRECISION word. | 
 | |||
| node1:  | 
 | 
 | 
 | |
| node1: Array size = 44739242, Offset = 0 | 
 | 
 | ||
| node1: Total memory required = 1024.0 MB. | 
 | 
 | ||
| node1: Each test is run 25 times, but only | 
 | 
 | ||
| node1: the *best* time for each is used. | 
 | 
 | ||
| node1: Function | Rate (MB/s) | Avg time | Min time | Max time | 
| node1: Copy: | 2679.4987 | 0.2566 | 0.2671 | 0.2675 | 
| node1: Scale: | 2606.1366 | 0.2640 | 0.2747 | 0.2776 | 
| node1: Add: | 3090.3320 | 0.3339 | 0.3475 | 0.3507 | 
| node1: Triad: | 3086.9809 | 0.3342 | 0.3478 | 0.3488 | 
| node1:  | 
 | 
 | 
 | |
| node1: Solution Validates | 
 | 
 | 
 | |
| node1:  | 
 | 
 | 
 | |
| node9:  | 
 | 
 | 
 | |
| node9: This system uses 8 bytes per DOUBLE PRECISION word. | 
 | |||
| node9:  | 
 | 
 | 
 | |
| node9: Array size = 44739242, Offset = 0 | 
 | 
 | ||
| node9: Total memory required = 1024.0 MB. | 
 | 
 | ||
| node9: Each test is run 25 times, but only | 
 | 
 | ||
| node9: the *best* time for each is used. | 
 | 
 | ||
| node9: Function | Rate (MB/s) | Avg time | Min time | Max time | 
| node9: Copy: | 2672.2059 | 0.2582 | 0.2679 | 0.2714 | 
| node9: Scale: | 2605.7793 | 0.2648 | 0.2747 | 0.2781 | 
| node9: Add: | 3095.3829 | 0.3345 | 0.3469 | 0.3518 | 
| node9: Triad: | 3093.9731 | 0.3348 | 0.3470 | 0.3522 | 
| node9:  | 
 | 
 | 
 | |
| node9: Solution Validates | 
 | 
 | 
 | |
| node9:  | 
 | 
 | 
 | |
| node24:  | 
 | 
 | 
 | |
| node24: This system uses 8 bytes per DOUBLE PRECISION word. | 
 | |||
| node24:  | 
 | 
 | 
 | |
| node24: Array size = 44739242, Offset = 0 | 
 | 
 | ||
| node24: Total memory required = 1024.0 MB. | 
 | 
 | ||
| node24: Each test is run 25 times, but only | 
 | 
 | ||
| node24: the *best* time for each is used. | 
 | 
 | ||
| node24: Function | Rate (MB/s) | Avg time | Min time | Max time | 
| node24: Copy: | 2662.2282 | 0.2587 | 0.2689 | 0.2725 | 
| node24: Scale: | 2599.2867 | 0.2649 | 0.2754 | 0.2786 | 
| node24: Add: | 3081.9215 | 0.3353 | 0.3484 | 0.3533 | 
Stream 65
