I had a look at the implementation of `slice`, and I found it odd that it doesn't have a copy loop. The current implementation does a raw memcpy of the underlying contiguous row-major data. As far as I can tell, this could only slice along rows. Interestingly, I found that the tests only tested for slicing along rows, so this bug would go unnoticed.
I added some tests that checks slicing along columns also. I have a feeling this would break, and we need to fix the implementation of `slice`. However I could be wrong, and hence I'm submitting these tests first to verify.