this was already discussed here:
it is possible, but not licensed by MSFT for client operating systems.
While Geoff Chappell's page (that you linked to) is technically accurate on what he discusses, he still fails to mention that you cannot (at least on Windows via the Win32 API) run an executable code block with an address above the 32bit boundary, because the 32bit CPU's registers won't be able to store an address that maps that high. PAE still limits you to a 4GB window, no matter where you slice it, although you can place it higher (above the 4GB boundary, for instance), to load a 32bit app's 2 or 3GB of VA (and the 2 or 1GB kernel VA) into that range when it's loaded. It's overhead for the OS (although probably not much, especially with today's processing speed and multi-core setups), and one that Microsoft only plans to support (or, to be more accurate, planned, as Server 2008 R2 is x64 only, and Win7 will very likely be the last x86 client) on server-class OSes. The RAMDisk drivers that get discussed, while good and useful, still suffer some drawbacks being PSE-36, namely things like all I/O doing double-buffering, or not being able to "slide" the 4GB mapped window like you can with PAE and AWE.
Also, with PAE, you've got some issues with overhead in the memory manager itself when you do interprocess communication, TLB reloads on context switching - and a bit more for this on multi-CPU systems, as the TLB buffers have to be synched to make sure they're all accurate in their virtual to physical TLB mappings (although Windows APIs do allow batching of this by an application to try and reduce the perf hit this will cause), you're storing more information in the session view space in kernel for PTE space, even though the kernel is still the same size with or without PTE, so there can be kernel pool and session space cramping on large memory systems in 32bit, and there are more that are documented on technet and MSDN if you want to read up.
Or, you could use an x64 OS and not have any of the overhead or restrictions of the Windows platform with regards to PAE and run everything natively (or in WOW64 isolation for a 32bit app, of course - with the side benefit that any 32bit app compiled LARGEADDRESSAWARE actually can use 4GB of VA instead of 3GB because there's no kernel space in the 32bit VA in WOW64). This is obviously easier for developers to handle, and there are other benefits above and beyond that x64 gives but that's outside the scope of this discussion.
Ultimately, does it matter? I dunno, I don't think most folks use more than 4GB anyway, and those of us that do are using x64 for reasons other than native support for more than 4GB memory. I still see it as the bridge Intel initially designed PAE (and PSE) for to allow for use of >4GB of RAM on x86 systems before they had their 64bit platform (Itanium) ready.