
While you're right about what the CPU's problems were, methinks you somewhat underestimate the challenge against all odds (or common sense) involved in it.
The fact is, all modern CPUs since the (IIRC) K5 already do code-morphing in silicon. It's a micro-coded design, although everyone calls it RISC-ified (or some similar thing) just for buzzword sake. Any modern CPU translates complex instructions, as seen by a compiler or program, to several simpler internal instructions and then executes those. They even have far more registers for those instructions than you see from the outside.
So Intel and AMD _already_ do some sort of code morphing, and did it long before Transmeta.
And I think you'll find it's quite a challenge to do that in software faster than dedicated silicon can do it.
But wait, here comes the best part. Any modern CPU has a very dynamic and advanced approach to managing that queue of micro-instructions. If one needs data that's not arrived yet, they can be reordered in flight, they can change lanes, etc. And it can even take leaps of faith on prediction, and generate code based on what will _probably_ happen (e.g., this particular loop most often jump to the start, and only rarely exit, so let's go ahead and predict that it will jump there this time too), knowing that it can invalidate the pipeline and try again if that's false. Etc.
That's all stuff that a code morpher can't do. If it predicted an optimization wrong, there's no way to have instructions reordered in memory, or switch lanes in a VLIW program, or lots of other things. It can just wait out for the data (wasted cycles) or recompile the code (a heck of a lot _more_ expensive.)
Branch prediction is an even bigger bitch when you compile that code instead of having some silicon generate it on the fly. Pretty much you _can't_ do it. Your code will have to assume that when you reach a conditional jump, it could go either way, and it must work right either way. You can't assume it'll always go one way, and recompile if you missed, because recompiling is a huge performance hit, far exceeding any advantages you might get from betting on one branch in advance.
Basically it's a freakin' miracle that it even worked as well as it did. They must have some pretty smart people there.
But they're already pushing the limits of what's even theoretically possible with that design. _That_ is their problem.
Basically saying that they should capitalize on that and focus on making it work as fast as dedicated silicon is... well, sorta like saying that they should have focused on finding the Philosopher's Stone and turning lead into gold. It's just not going to happen. It's not even theoretically possible. Sorry.