Optimize work-stealing deque
- Add padding to avoid false sharing
- Use a GADT to express desired result type
- Use various tweaks to improve performance
- Remove negative test that uses the WS deque in an invalid unsafe way
- Implement caching of the thief side index