Was not paying attention to which registers were for scratch.
Avoid the need to preserve registers by not using registers
in the Q4-Q7 range.
Fix ScaleDown2Int_NEON by changing how rounding was applied.
ScaleDownRow4 changed to process 4 output pixels per loop.
No need to push/pop registers for UV Transpose, removed
functions.
Fix for CPU Flag for scale_test.cc to turn on/off optimizations
for timing.
Review URL: http://webrtc-codereview.appspot.com/259002
git-svn-id: http://libyuv.googlecode.com/svn/trunk@58 16f28f9a-4ce2-e073-06de-1de4eb20be90