The conventional physics way of explaining this is as follows:
One way of asking "what is the current state of the universe?" is to pick a Cauchy surface. This is just a "slice" of the entire universe at a given time. There is a lot of freedom in the choice of slice: In Minkowsky space for example, there are slices corresponding to every choice of rest frame, and many more besides those. We just need to make sure that no points on the surface lie within each-other's light cones ().
The information (about field values & derivatives) lying on any particular Cauchy surface is enough to predict the future and past from that surface. Pick any two Cauchy surfaces, and there's a unitary operator mapping one to the other. This is the relativistic version of a time-evolution operator.
Some Cauchy surfaces are entirely later in time than other Cauchy surfaces. (Though some pairs of Cauchy surfaces are partially later and partially earlier than each other.) We'll say that for Cauchy surfaces , that exactly when for all points , either is spacelike separated from or is in the future lightcone of .
Let be a function that measures the entropy on a given Cauchy surface. The second law of thermodynamics then says that if then .
I think it's because Schroedinger's equation is
which, breaking into its energy eigenstates gives
In Minkowski space, the time is literally imaginary. What if we rotated it in the complex plane so that it's just like the other dimensions? This is called the Wick rotation, , and we recover the Boltzmann (Gibbs) distribution
or equivalently
takes on the role of inverse-temperature. Now, what is the entropy-maximizing distribution? Let
We want
Lagrange multipliers give
or the same Boltzmann (Gibbs) distribution we saw earlier. So basically, if we switch to a coordinate axis where time is identical to the other distributions, we find that evolution through time is equivalent to a maximum-entopy distribution in this other coordinate axis.
Now, this is almost circular reasoning, because why is Schroedinger's equation the way it is? Basically, if you have some observer (careful here! we're reusing variable names with different meanings!), and you have some object , and you say that looks the same to the observer as the observer changes, that's written mathematically as
where is some matrix/linear transformation. So
After a long enough time, we're really only looking at rotation. So, why is our "observer" the same as the time dimension? Well, it's not. But imagine it's moves mostly in that dimension, maybe around times faster, with a little mixing in of the other three. Then the mixing in of the other three will make it so the entropy-maximizing distribution we see in 4 space dimensions should also be entropy-maximizing if we Wick rotate any one of those dimensions.
Like every other confusing word, time is overloaded. It means at least two different things:
1: The direction in which entropy increases, and hence in which we experience things.
2: That one dimension of the 4 dimensions of spacetime that is timelike in the Lorentzian sense. Namely, the interval between two events in Minkowski space is defined in terms of 4 dimensions as
One of those dimensions has a positive sign in the formula, the others negative. We call the one with a positive sign "time".
Now why are these two directions approximately the same? Is that by chance or is it for a fundamental reason?
Instead of doing research on this, I thought about it for a while, so I might be talking complete nonsense, but here's what I think I've discovered.
To think about this correctly it helps to discard our usual model of the world, and think about spacetime as a 4d space, rather than time as a separate thing. Since imagining 4d things is hard, let's discard one of the space dimensions so that we can use the X and Z coordinates to represent space, and the Y coordinate to represent time.
Also, to make the maths easier lets scale the chart so that our units on the X and Z axis are 299,792,458 metres each, and our units on the Y axis are seconds. Then light travels 1 unit in the Y direction for every 1 unit it travels on the X/Z plane.
In this model we model particles as strings rather than points, where each string traces the history of a particle over time. When we take a cross-section of spacetime along any surface, we see the particles as the points we're used to instead of strings.
Now what makes the time dimension special is that none of these strings can ever have an angle greater than 45 degrees relative to the Y axis (i.e. they can't go faster than the speed of light). One straightforward impact of that is the strings can never loop back on themselves, so that any cross section that is orthogonal to the time axis will never have the same particle multiple times. A cross section orthogonal to the X or Z axis does not have this property, and the same particle is likely to appear in one of them multiple times.
Note that just like our choice of X and Z axis are arbitrary, our choice of Y axis is arbitrary. I can pick any direction that is less than 45 degrees off my current Y axis, perform a suitable Lorentz transformation, and call that my new Y axis, and all these properties will continue to hold. The seemingly mind-blowing thing about relativity is that I can do that multiple times and my final Y axis will still never appear more than 45 degrees off the original Y axis.
When viewing spacetime in this manner we need to rephrase our rules of physics. Instead of prescribing how space evolves from one moment in time to the next, they can instead be viewed as describing how spacetime looks like in practice. For example, given that a bunch of strings meet at a particular point, and you pick a random direction for each string and tell me what the string looks like in that direction, I can then tell you what the other half of each string looks like in the immediate vicinity of the meeting point (up until one of them meets another string).
Given a snapshot of the state of some particles in a partial cross-section of space at a particular point in time, we can (subject to computational constraints) predict the state of those particles indefinitely, so long as no new other particles interact with them. Since no particles can travel faster than the speed of light we can draw an inverted cone over the area where we know with certainty there are no foreign particles, and can predict all particles in that volume perfectly.
This can be done inductively - by knowing the state of all particles at time t, we calculate the state of all particles at time t+1, then use that to calculate t+2 etc. Even though (if?) time isn't discrete, we can still calculate state to arbitrary precision by reducing the size of our steps.
The same applies backwards in time of course.
Can we do the same in space? If I draw a cross section in spacetime orthogonal to the X axis, and tell you the state of any particles that ever pass through that cross section (including Y and Z coordinates), can we use that to predict the state of the particles at other points in spacetime?
The first problem is that there's no guarantee that foreign particles don't interact with the particles that pass through our cross-section so it's impossible to predict anything with certainty.
But let's say I guarantee that every single particle in the universe passes through our cross-section at least once, so we have some information on all particles. Can we then use this inductive process to iteratively calculate the state of all those particles across the rest of spacetime?
Unfortunately, the answer is still no: A particle could go far away from our cross-section, then return again, and then interact with a particle nanometres away from our cross-section, before shooting off into the distance for good. This will throw off our iterative calculations off, which have no idea of its existence till they reach much further out. We can't be certain of the state of the universe even arbitrarily close to our cross-section.
Indeed there could be multiple possible spacetimes which have identical cross-sections through them. We're reduced to finding spacetimes that are consistent with our knowledge, and what's worse, there's no algorithm which can find them[1].
Consider now a Euclidean universe where all the dimensions are spacelike[2]. There's no maximum speed of light, and particles are free to turn back on themselves - indeed a particle might even be a loop.
Given our previous section it's clearly impossible to accurately predict the behaviour of particles in a Euclidian space. Does that mean that memory and action are impossible since both require reconstructing the past or the future based on the state of the present?
Not necessarily, as we don't need perfect knowledge of the past or the future, just good enough. Perhaps we can still predict with sufficient accuracy for memory and action to work.
So can we have an arrow of time in a Euclidean universe?
Consider such a universe with a singularity. Every particle has one end at the singularity. From the singularity particles head out in every direction. Close to the singularity particles have extremely low entropy - particles are packed in very dense clusters, with much less dense areas between clusters, allowing for large energy gradients.
A bit further away from the singularity, entropy is still very low, and in our local space all particles are approximately parallel.
The particles interact in predictable ways, and by pure random chance we get self perpetuating patterns in the particle strings - patterns in particle strings which "cause" other particle strings further out to bend into the same patterns. This is possible because the energy gradient causes predictable behaviour which the patterns can exploit to modify the shape of other particles. Then evolution kicks in, and the rest, as they say, is history.
To a conscious being in such a universe their perception of time will be in the direction directly opposite to the singularity.
Could the same thing happen in our universe? Could the arrow of time point in a spacelike direction instead of a timelike one?
No. Entropy can't increase in a vacuum - it instead increases as particles become more disordered. The arrow of time always follows particles and their interactions. Particles move slower than the speed of light, and so their trajectories are always timelike.
And so our conclusion is: It's not that the time dimension is special and allows the arrow of time to exist, it's rather that the time dimension prevents the arrow of time from existing in a spacelike dimension by blocking particles from moving faster than .
Which is perhaps a bit obvious, but I found the exercise of visualising all this helpful for better understanding the arrow of time.
To be precise: In a universe with infinite particles, given a snapshot orthogonal to a time axis I can calculate the state of any point in spacetime to arbitrary precision in a finite amount of time. Given a snapshot in space orthogonal to a space axis I cannot.
Greg Egan's story Orthogonal is set in such a Euclidean universe. His website contains a wealth of information about the physics of such a universe.