Post#24 » Wed Jan 13, 2021 6:39 am
Ok, here's the code. It works on a vanilla KOS install with the currently available kos-ports SDL. Compile with "make -f Makefile.dc". The ROM loaded is hardcoded in main.c, you'll have to change this to a romdisk or dcload file before you compile. Sorry for how messy the code is.
Controls:
D-Pad = D-Pad
Start = Start
A = C
B/X = B
Y = A
L/R = Quit
There are some keyboard debug buttons, you can see the code in input.c.
It looks like KOS moved the SDL and zlib includes into their own directories. I wanted to avoid going through every file and changing every "#include <SDL.h>" to "#include <SDL/SDL.h>", so I created a stub SDL.h that includes SDL/SDL.h and added it to the include path. By doing this, you can easily find which files I changed by looking at the modification times; I didn't have to modify every #include occurance.
The assembly files are all .S files and not .s files. This means that they use the C preprocessor. The Dreamcast makefile included, Makefile.dc, is set to assemble them as preprocessed C, but the original Makefile and KOS's Makefile.rules default to NOT running the preprocessor on .S files. If you move the assemblies to another copy of Gens4All without changing the makefile to preprocess them, you'll get tons of errors. Preprocessed assembler is compiled with "kos-cc -c" instead of "kos-as".
I made some changes to the profiler, but I don't remember what they were exactly anymore. The bars on the bottom of the screen show CPU load of parts of the system. The tick marks are millisecond/frame positions.
Bright green: Total CPU time
Dark green: Frame time
Red: 68K emu (+sound time seems mixed in too?) time
Blue: FM synth time
Purple: SH4 draw time
Grey: Z80 emu time
One improvement I found for the original C YM2612 code was changing some global variables to local. They didn't need to be global, and the SuperH is worse at accessing global variables than stack allocated ones. This got something like a 5-10% speed up on YM2612 emulation.
It looks like the buggy asm LFO-YM2612 is currently enabled. You can revert to the C version by changing some #ifs in ym2612.c. I also tried getting the sound code to support nonstandard sample rates like 16kHz, to see if it could reduce CPU load without messing with the audio too much, but it didn't work out.
Here's how the renderer works:
First, the video RAM of the Genesis is converted to a texture. The PVR supports a 4-bit color format, but it's annoying to use because it always has to be twiddled. Instead, a non-twiddled VQ compressed ARGB4444 texture is used. The VQ compression codebook is basically a palette on steroids; instead of each palette entry representing one color, it represents four colors. On non-twiddled textures, those colors are arranged in a 4x1 rectangle. By creating the correct codebook/palette, it's possible to get the PVR to accept untwiddled 4-bit color textures. This makes converting VRAM to a texture much cheaper; it's pretty much a modified memcpy instead of a complicated ball of bittwiddling.
The PVR only really allows one codebook per VQ texture, but we need multiple palettes. One way would be to use multiple complete textures, but it would be nice to have one copy of the texture that we can palette swap. If we store the codebooks one after the other, followed by one set of indices, the codebook adjacent to the indices will work as expected, but trying to use the other codebooks will have data from some of the other codebooks as garbage at the top of the texture. We compensate for this by adjusting the UVs of the tile polygons drawn depending on which codebook is used, skipping over the garbage data. A big table is used to precalculate the UVs; the palette, flipping, and tile index are all used to index into the table and return the correct UVs for the tile.
An ARGB4444 color format is used because it requires the least bit manipulation to convert a Genesis palette to. The only difference is that the positions of the red and blue channels are swapped. It can also potentially be used for a shadow/highlight mode.
Using a DC emulator with a texture viewer, it looks like Smash Pack also converted tiles into textures, but did one texture per tile/palette combo instead of putting them all tiles in one palette swapped texure. Then it cached unchanged tiles between frames. The slowdown Smash Pack has during full screen palette effects (Vectorman is notable) is probably because it has to regenerate all tiles. The method I use doesn't need any tile cache management and has a very consistent speed no matter what is going on.
Every tilemap and sprite are walked through, and each tile is then drawn as a textured quad.
Priority is approximated by setting the depth of the tile polygons. Low priority tiles have a lower depth value than high priority tiles. This has issues with the sprite layer. On a real Genesis, it's possible for a low priority sprite to be drawn over a high priority sprite (this can be used to create some interesting masking effects with tilemaps). This isn't supported, low priority sprites will always be drawn below high priority sprites, but it usually doesn't come up too often.
The background color is used to draw a solid color quad behind the tiles.
There are two copies of the Genesis VRAM texture in the DC video RAM. One copy is actively being rendered from by the PVR, the other copy is the one to be used for the next frame. If you didn't double buffer them, rapidly changing tiles (like character animation) might get the wrong tiles when drawn, depending on the progress of the PVR and when the DMA occurs.
There are many improvements that can be made to the renderer:
Get 32-bit color working on vanilla KOS. My old modified PVR driver supported 24/32 bit color, but it doesn't seem to be working on an unmodified KOS. In the meantime, dithering is disabled.
The bottom row of tilemaps are drawn incorrectly depending on vertical scrolling.
The texture generation can probably be sped up some. It reads and writes the tile data as 32-bit values for speed, but currently has to byteswap the values. It should be possible to get rid of these byteswaps by flipping the tile UVs horizontally and changing the palette order.
The code currently DMA's the VRAM texture, blocks until it's finished, then submits the tile polygons. This blocking is wasted time, but it's not possible to DMA a texture and submit polygons to the TA at the same time. The data written will get corrupted if you try. I think if this is changed to submitting the polygons, then DMAing the texture, you can do a nonblocking DMA and get emulation work done while the texture is being transferred. I'm not totally sure if this is possible/safe to do with KOS's PVR API. You would need to ensure rendering can't start before the texture is transferred, so the pvr_list_finish and pvr_scene_finish calls would have to be done in the DMA complete callback... (IIRC, the pvr_scene_finish is mostly for show when you aren't using TA DMA, so delaying pvr_list_finish would be the important part to ensure rendering can't occur before the texture is transferred.)
Some line scroll support could be added by skewing a row of a tilemap based on the line scroll values. Smash Pack does this. I already tried putting partial support in, but getting it to work robustly is complicated. You'll have to draw more tiles to fill in the holes created by the skewing. Smash Pack often has errors on the left edge of the screen when doing line scroll. (I've looked into doing PVR Genesis renderers drawing lines instead of tiles, but wasn't clearly worth the overhead of sending and processing 8 times the number of quads vs doing it completely in software.)
Adding additional codebooks could add some support for a palette change raster effects, like the water in Sonic games. The color change line could be implemented by either software clipping the waterline tiles, or using modifier volumes to switch textures and UVs for part of the screen. Some games change the background color per line; this could be supported by drawing background plane quads per line instead of per screen.
Some games letterbox the screen by turning the display off for part of the frame (e.g. Gunstar Heroes intro). This could be supported by drawing quads with the background color on top of the screen.
Shadow/highlight isn't implemented, and 100% correct emulation isn't possible, but an approximation could be done by making the making the shadow/highlight pixels 50% transparent black/white.
Vertical cell scrolling could be supported when full screen horizontal scrolling is used. I don't think it's a common setting, though. Maybe MUSHA does it? Cell/line + vertical cell scrolling would cause tiles to tear, and isn't really feasible to do.
It looks like I made an attempt at supporting tile map windowing for Revenge of Shinobi's status bar, but it's a hack.
I think the original frame skipping code was disabled/bypassed when I jammed the PVR code in there, and I tried to add some kind of basic auto frameskip, but I disabled it. I'm not sure if it works. Probably not.
-
Attachments
-
- gens4all_pvr.7z
- (606.53 KiB) Downloaded 378 times