Gens4All with Z80 Emulation

Place for discussing homebrew games, development, new releases and emulation.

Moderators: pcwzrd13, deluxux, VasiliyRS

TapamN
letterbomb
Posts: 150

Re: Gens4All with Z80 Emulation

Post by TapamN »

Progress update:

Rendering overhead has been reduced to around 2.0-2.2 ms. Originally, when creating the Genesis VRAM texture, it would write to main RAM, then DMA it into the DC's video memory. It would have to wait for DMA to complete, since it interfered with writing to the TA. Now it writes directly to VRAM using SQs. When I first wrote it, trying to SQ to VRAM was much slower than writing to main RAM then DMAing to VRAM, but I've learned that it's possible to SQ to video RAM at DMA speeds. You can't use the address KOS returns when allocating VRAM, you have to convert it to the address used for DMA (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.

Partial support for color change raster effects has been added. See the first row of screenshots (left image is before, right is after). It mostly works, but at the moment, the color change happens on a per-tile basis (On the screenshots, the color change lines up closely with the tile map tiles, so it's not too noticeable there. The cyan dot on the left marks where the color change is supposed to be.). For tiles that cross the color change line, I'll need to cut the quad in two at the line and draw each half with the correct palette. Only one color change is supported per frame.

There's now Smash Pack style line scroll faking. It looks at the line scroll values for the top and bottoms rows of a tile and skews the tiles to fake line scroll. See the second row of images. Left is before, right is with the line scroll faking.

There are some problems, though. It works well for something like the floor in a fighting game, since the lines in between the top and bottom of a tile change by the similar amounts. The stars in Gunstar Heroes' space areas would have problems since each line has a completely different scroll value. Another place with issues is with infinitely scrolling parallax, like Sonic 2's Emerald Hill background. Eventually, it wraps around and the rendering code thinks it's supposed to be skewed the other way. The floor in Toy Story or the icebergs in S3's Ice Cap wouldn't have this problem because they don't really scroll forever. They scroll one repetition of the pattern, then jump back to keep it from scrolling too far.

The 3rd row of screenshots shows pretty much every flaw with this method of faking line scroll. The area around the rings is skewed one way in the left shot, but in the right shot, after scrolling a bit more, the skew direction switches.

Something I wasn't expecting with tile skewing is that at certain steep angles of skew, there are incorrect pixels drawn. (See the out of place pixels in the red box in the last image.) I think the PVR's texturing math has precision issues and it's sampling texels from an adjacent tile. It might be possible to hide or work around this. Maybe detect problem skew values and replace them with similar ones without the problem?

The line scroll method probably needs to be selectable by the user, or the line scroll values need to be scanned and the renderer tries to pick the best option. Subdividing tiles (i.e. rendering 8x4 quads) for greater line scroll resolution might help.

I'm going to work on getting save states working, since that will help with testing. It looks like when switching to the SH4 68K core, the method for reading and writing 68K state changed, but the save state code was #if 0'ed out instead of updating it.

I think I've figured out a possible improvement on the sound handling by skipping SDL. Currently, sound works like this:
  • The Genesis emulation writes 32-bit samples generated in one frame to two buffers (one per channel)
  • At the end of the frame, these samples are clamped to 16-bits and written to a single buffer, with channels interleaved
  • SDL thread periodically checks the AICA playback position
  • When the AICA has played enough, it requests samples from the buffer
  • SDL de-interleaves the samples when writing to sound RAM
I think the fastest thing to do would be this:
  • The Genesis emulation writes 32-bit samples generated in one frame to two buffers (one per channel)
  • At the end of the frame, DMA these buffers to sound RAM and let the ARM deal with it
I know the ARM is pretty slow, but even it should be able to do what's basically a fancy memcpy fast enough. It's a LDM, two CMPs and MOVcc's plus a STRH per sample per channel in a loop. If the ARM still turns out too slow, removing the buffer in between what the emulator generates and what the AICA reads from would still help. The clamping could be done while waiting for the G2 FIFO to clear.
Attachments
gens4all shots.png

User avatar
FlorreW
Animated Violence
Posts: 499
Dreamcast Games you play Online: Quake 3 sometimes

Re: Gens4All with Z80 Emulation

Post by FlorreW »

Thank you for trying to improve this emulator mr TapamN. Would be really nice to have a close to 100% working Genesis emulator for the Dreamcast.

SMiTH
Black Mesa
Posts: 1497

Re: Gens4All with Z80 Emulation

Post by SMiTH »

TapamN, I found the ljsdcgen Version 1 (generator v.34 based) source the other day on webarchive. It is bluecrab's sega genesis emulator. Maybe something useful for ya idk?

Code: Select all

Release notes for ljsdcgen (Version 1 (generator v.34 based)):
<[email protected]>

- Frameskip is set to 7, you can adjust this with the joystick, up and down
- Controls are mapped as follows (on a dreamcast controller):
    Genesis A     -> DC X
    Genesis B     -> DC A
    Genesis C     -> DC B
    Genesis Start -> DC Start
    Genesis DPad  -> DC DPad
- Please let me know of any issues that come up.
- Make sure that the fonts directory and its contents are on the disc.
- All games must be in the roms directory!
- Please do not put more than 1024 roms on the disc.
- There is no sound yet.
- This is a test release! Please do not expect it to be anywhere near perfect!
- Please read the screen that comes up at boot-time! Understand it! Adhere to it!
Attachments
ljsdcgen-v1-bin.tar.bz2
(563.22 KiB) Downloaded 193 times
ljsdcgen-v1-src.tar.bz2
(216.14 KiB) Downloaded 243 times

User avatar
Ian Micheal
Developer
Posts: 6035
Location: USA
Contact:

Re: Gens4All with Z80 Emulation

Post by Ian Micheal »

Acia is just very slow i tried to do something like that with dreamneo cd and it turned out the current sdl i gave you was still faster .. which was a little shocking.. I love to see some one do it.. BA tried that with the acia and it was super slow as well..

Smash pack they had trouble with sound they just lowed pitch and faked it underclocked and it's still bad..


Run benchmark on how slow the ACIA really is you will get the idea

User avatar
Ian Micheal
Developer
Posts: 6035
Location: USA
Contact:

Re: Gens4All with Z80 Emulation

Post by Ian Micheal »

Rendering overhead has been reduced to around 2.0-2.2 ms. Originally, when creating the Genesis VRAM texture, it would write to main RAM, then DMA it into the DC's video memory. It would have to wait for DMA to complete, since it interfered with writing to the TA. Now it writes directly to VRAM using SQs. When I first wrote it, trying to SQ to VRAM was much slower than writing to main RAM then DMAing to VRAM, but I've learned that it's possible to SQ to video RAM at DMA speeds. You can't use the address KOS returns when allocating VRAM, you have to convert it to the address used for DMA (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.
On my update port of Gensplus SDL which i ported in 2003 it's using dma zero frameskip or limiter as you can see parts run well overspeed



Speed is not a problem in fact it's too fast

what is the change or the why to go to this trouble to use SQ could you not use dma and what was the change in time between this and the other..

Trying to understand why you want to use SQ cant you not use this to clear the screen while using DMA?

I have always seen even on your own benchmark DMA being up to 1mb faster output


Darkness layer 2 not included in katana uses DMA not SQ which is used in quite a few games was used on first party sega titles
and is what many dev talked about ..

Code: Select all

OUTPUT:> ----------SQ/Cache test----------
OUTPUT:> Cache read: 00c0ffee
OUTPUT:> Cache invalidate read: deafbeef
OUTPUT:> SQ did NOT flush cache, we got the old value
OUTPUT:> We got the new results from the SQ after invalidating the cacheline and reloading
OUTPUT:> ----------Bandwidth tester----------
OUTPUT:> ---RAM SQ Bandwidth Test---
OUTPUT:> RAM SQ Bandwidth: 520.42 Mbyte/sec
OUTPUT:> ---RAM Cache Bandwidth Test---
OUTPUT:> RAM cache Bandwidth: 518.94 Mbyte/sec
OUTPUT:> ----EOS EVERY 16777216 VERTS----
OUTPUT:> ---TA Off Screen SQ Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR TA SQ Off Screen Bandwidth: 244.73 Mbyte/sec
OUTPUT:> PVR TA SQ Off Screen Vert Speed: 7.65 MVert/sec
OUTPUT:> PVR TA SQ Off Screen Tri Speed: 7.65 MTri/sec
OUTPUT:> ---TA Off Screen DMA Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR TA DMA Off Screen Bandwidth: 246.00 Mbyte/sec
OUTPUT:> PVR TA DMA Off Screen Vert Speed: 7.69 Mvert/sec
OUTPUT:> PVR TA DMA Off Screen Tri Speed: 7.69 MTri/sec
OUTPUT:> ---TA On Screen SQ Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR TA SQ On Screen Bandwidth: 229.60 Mbyte/sec
OUTPUT:> PVR TA SQ On Screen Vert Speed: 7.17 MVert/sec
OUTPUT:> PVR TA SQ On Screen Tri Speed: 7.17 MTri/sec
OUTPUT:> ---TA On Screen DMA Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR TA DMA On Screen Bandwidth: 234.75 Mbyte/sec
OUTPUT:> PVR TA DMA On Screen Vert Speed: 7.34 Mvert/sec
OUTPUT:> PVR TA DMA On Screen Tri Speed: 7.34 MTri/sec
OUTPUT:> ---TA Big Poly SQ Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR SQ Big Poly Bandwidth: 28.50 Mbyte/sec
OUTPUT:> ---TA Big Poly DMA Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR DMA Big Poly Bandwidth: 28.30 Mbyte/sec
OUTPUT:> ---TA Big Sprite SQ Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR SQ Big Sprite Bandwidth: 8.58 Mbyte/sec
OUTPUT:> ---TA Big Sprite DMA Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR DMA Big Sprite Bandwidth: 10.15 Mbyte/sec
OUTPUT:> ---Texture DMA Test---
OUTPUT:> pvr: enabling vertical scaling for non-VGA
OUTPUT:> PVR VRAM64 DMA Bandwidth: 604.98 Mbyte/sec
All the benchmarks DMA was the same or faster then SQ and i have found this to be the case when things get demanding, DMA does not bounce around or stall compared to SQ the change of using DMA on dreamneo cd and not SQ's is pretty vast up to 10fps slower using SQ

So your saying your gaining time not waiting for dma ?

Here is my own benchmark on SDL dreamhal compared to chui's or what's included in kos

Code: Select all

fixed some bugs now have more speed ```SDL CHUI1.2.9 kos 1.3

OUTPUT:>                           320x240  320x240    640x480                 640x480
OUTPUT:>                           software hardware software                   hardware
OUTPUT:> Slow points (frames/sec):  0.20885 0.208823 0.0250253 0.0250252 
OUTPUT:> Fast points (frames/sec):  18.4372  18.4544  4.47263  4.47271 
OUTPUT:>    Rect fill (rects/sec):  652.125  652.333  156.312    156.3 

OUTPUT:>  32x32 blits (blits/sec):  1327.28  1327.28  1277.21   1277.6 
OUTPUT:> arch: shutting down kernel
OUTPUT:> maple: final stats -- device count = 2, vbl_cntr = 56270, dma_cntr = 56266
OUTPUT:> vid_set_mode: 640x480 NTSC```
STATE:> Upload processus completed on 6/17/2021 - 09:31:53, Exit Code : 0

SDL DREAMHAL IAN MICHEAL
OUTPUT:>                           320x240  320x240  640x480  640x480
OUTPUT:>                           software hardware software hardware
OUTPUT:> Slow points (frames/sec):  0.41898  1.03079 0.0501951 0.129862 
OUTPUT:> Fast points (frames/sec):   37.021  65.5234  8.96515  16.5076 
OUTPUT:>    Rect fill (rects/sec):  1318.31  2645.99  314.424   700.77 
OUTPUT:>  32x32 blits (blits/sec):  3056.72  3934.68  2868.35  3927.13 
OUTPUT:> arch: shutting down kernel
OUTPUT:> maple: final stats -- device count = 2, vbl_cntr = 19933, dma_cntr = 19919
OUTPUT:> vid_set_mode: 640x480 VGA
my version of SDL is almost 2x faster blit speed this why i got 35fps on straight dos quake port compared to 18fps with current included kos not only does the benchmark match simple recompile with any other SDL you lose almost half speed.. And this is using pvr dma


What is the time saved by trying to avoid one of the consoles features ? is it just good for this emulator?..

I can tell you darkness layer 2 uses dma for rendering not SQ

I know DMA can steal bandwidth seen cases where it was slower just trying understand why you always try to use SQ over what sega did with there darkness layer 2 which was avoid using SQ and using dma for rendering.. If you dont have the layer it was sega internal src i can send it to you

MastaG
Quad Damage
Posts: 204

Re: Gens4All with Z80 Emulation

Post by MastaG »

Seeing the two of you talking about this really makes me horny! :P
TapamN, if you create a Patreon account I'd be happy to support you as well.

For me personally, any improvement to SNES en Genesis emulators are most welcome to me.
That and full blown games such as Interpid Izzy.
Because I prefer adventure games (mostly platformers) over arcade stuff, fighting games and horizontal shooters lol.

The only thing the both of you could do imo, is improving your workflow a bit.
E.g. so many changes to kos and it's ports such as SDL.

Why not spend some time forking the official kos repository to your own GitHub account and simply commit all of your changes and improvements so everyone can benefit.

Then everyone can see the commit logs, check it out and build it themselves.
Now if somebody posts his source for some project, I'm unable to build it due to a heavily changed kos toolchain and ports which I don't have:)

dcsteve
undertow
Posts: 27

Re: Gens4All with Z80 Emulation

Post by dcsteve »

TapamN wrote:Progress update:

Rendering overhead has been reduced to around 2.0-2.2 ms. Originally, when creating the Genesis VRAM texture, it would write to main RAM, then DMA it into the DC's video memory. It would have to wait for DMA to complete, since it interfered with writing to the TA. Now it writes directly to VRAM using SQs. When I first wrote it, trying to SQ to VRAM was much slower than writing to main RAM then DMAing to VRAM, but I've learned that it's possible to SQ to video RAM at DMA speeds. You can't use the address KOS returns when allocating VRAM, you have to convert it to the address used for DMA (basically, replace the top byte of the pointer with 0x11), set some of the DMA registers, then SQ there.

Partial support for color change raster effects has been added. See the first row of screenshots (left image is before, right is after). It mostly works, but at the moment, the color change happens on a per-tile basis (On the screenshots, the color change lines up closely with the tile map tiles, so it's not too noticeable there. The cyan dot on the left marks where the color change is supposed to be.). For tiles that cross the color change line, I'll need to cut the quad in two at the line and draw each half with the correct palette. Only one color change is supported per frame.

There's now Smash Pack style line scroll faking. It looks at the line scroll values for the top and bottoms rows of a tile and skews the tiles to fake line scroll. See the second row of images. Left is before, right is with the line scroll faking.

There are some problems, though. It works well for something like the floor in a fighting game, since the lines in between the top and bottom of a tile change by the similar amounts. The stars in Gunstar Heroes' space areas would have problems since each line has a completely different scroll value. Another place with issues is with infinitely scrolling parallax, like Sonic 2's Emerald Hill background. Eventually, it wraps around and the rendering code thinks it's supposed to be skewed the other way. The floor in Toy Story or the icebergs in S3's Ice Cap wouldn't have this problem because they don't really scroll forever. They scroll one repetition of the pattern, then jump back to keep it from scrolling too far.

The 3rd row of screenshots shows pretty much every flaw with this method of faking line scroll. The area around the rings is skewed one way in the left shot, but in the right shot, after scrolling a bit more, the skew direction switches.

Something I wasn't expecting with tile skewing is that at certain steep angles of skew, there are incorrect pixels drawn. (See the out of place pixels in the red box in the last image.) I think the PVR's texturing math has precision issues and it's sampling texels from an adjacent tile. It might be possible to hide or work around this. Maybe detect problem skew values and replace them with similar ones without the problem?

The line scroll method probably needs to be selectable by the user, or the line scroll values need to be scanned and the renderer tries to pick the best option. Subdividing tiles (i.e. rendering 8x4 quads) for greater line scroll resolution might help.

I'm going to work on getting save states working, since that will help with testing. It looks like when switching to the SH4 68K core, the method for reading and writing 68K state changed, but the save state code was #if 0'ed out instead of updating it.

I think I've figured out a possible improvement on the sound handling by skipping SDL. Currently, sound works like this:
  • The Genesis emulation writes 32-bit samples generated in one frame to two buffers (one per channel)
  • At the end of the frame, these samples are clamped to 16-bits and written to a single buffer, with channels interleaved
  • SDL thread periodically checks the AICA playback position
  • When the AICA has played enough, it requests samples from the buffer
  • SDL de-interleaves the samples when writing to sound RAM
I think the fastest thing to do would be this:
  • The Genesis emulation writes 32-bit samples generated in one frame to two buffers (one per channel)
  • At the end of the frame, DMA these buffers to sound RAM and let the ARM deal with it
I know the ARM is pretty slow, but even it should be able to do what's basically a fancy memcpy fast enough. It's a LDM, two CMPs and MOVcc's plus a STRH per sample per channel in a loop. If the ARM still turns out too slow, removing the buffer in between what the emulator generates and what the AICA reads from would still help. The clamping could be done while waiting for the G2 FIFO to clear.
Amazing progress Tap! Great to hear you reduced overhead. Excited to see how your ARM processing test goes. Improving sound quality while maintaining the currently great video sync and framerate sounds fantastic. For the games requiring hacks, would it be easier to make a user menu setting to manually activate the hacks common among a set of many games?

TapamN
letterbomb
Posts: 150

Re: Gens4All with Z80 Emulation

Post by TapamN »

New version. Main additions are an color rasters, saving SRAM to VMU, in-game pause menu, better controller support (6 button and 2 player), better line scroll support, and partial shadow/highlight support.

First, some replies:
Ian Micheal wrote:On my update port of Gensplus SDL which i ported in 2003 it's using dma zero frameskip or limiter as you can see parts run well overspeed
So it does VDP rendering on the CPU, and it's too fast? I tried coming up with a fast software renderer, but drawing and combining the layers seemed too much for the SH4 to do while leaving time for CPU and sound emulation. I tried some hybrid software and hardware renderers, but they didn't turn out well. What is yours doing that makes it fast? What kinds of raster effects does it support? How consistent is its speed? Is there any point to working on Gens4All?
Ian Micheal wrote:All the benchmarks DMA was the same or faster then SQ and i have found this to be the case when things get demanding, DMA does not bounce around or stall compared to SQ the change of using DMA on dreamneo cd and not SQ's is pretty vast up to 10fps slower using SQ
Those are just raw bandwidth figures in a vacuum. What matters is overall system speed in real world conditions. Theoretically, DMA is slower than SQs. If you DMA something that the CPU generates, the process looks like this:
  1. CPU reads source data
  2. CPU writes results to main RAM
  3. DMA reads results from main RAM
  4. DMA writes to hardware
With SQs, you just have
  1. CPU reads source data
  2. CPU writes results to hardware
...which seems faster and more efficient, but that's not always the whole story. For example, writing to hardware can be slow, so using DMA can allow the DMA controller to be slowed down instead of the CPU. I was using DMA for transferring the texture before because I didn't know how to do fast texture SQs.
What is the time saved by trying to avoid one of the consoles features ?
SQs are a feature, too. It's about picking the right one.

With my version of Gens4All, rendering is done with the PVR. The Genesis's VRAM is converted into a texture for the PVR. The texture is a specially prepared VQ compressed texture to allow for faster conversion of 4-bit data from the Genesis's VRAM then is possible with the PVR's normal 4-bit texture format. Creating the texture took four steps originally:
  1. Generate palette codebooks
  2. Reorder tile data into texture
  3. Flush cache so the results are in main RAM
  4. DMA texture from main RAM to texture RAM
It's not possible to do texture DMA while sending commands to the TA. The TA gets confused and the system hangs, you have to sit and wait for DMA to complete. (Another option would be to send commands to the TA, then do DMA, and hope that the DMA completes before KOS decides to trigger rendering. It would probably work most of the time, but I didn't try it.)

I profiled the original version of the texture generation that was used. The different parts took this long:

Code: Select all

CB    0.10 ms
VRAM  0.97 ms
Flush 0.04 ms
DMA   0.28 ms
Total 1.39 ms
Getting rid of DMA allows skipping the Flush and DMA sections. With the SQ version, it now takes this long:

Code: Select all

CB    0.10 ms
VRAM  0.49 ms
Flush 0.00 ms
DMA   0.00 ms
Total 0.59 ms
The need to flush the cache and do DMA have been eliminated, and converting Genesis VRAM into a texture is faster. Why did the texture conversion get faster? It's possible to have the SQ version can write to main RAM/cache, not using the SQ at all. It runs about the same speed as the original version. It's possible to experiment with it to get an idea where the speed increase comes from.

The original version went through each tile and wrote it out to 8 lines of the texture. This doesn't work well for SQs, which write 32 byte blocks instead of 4 bytes. The new version takes a row from 8 different tiles and writes one block to the SQ.

One advantage of the SQ is that there are no cache misses on writes. When writing to a cache line not already in cache, the SH4 has to load what's already in RAM at that location before preforming the write. The MOVCA.L instruction exists to avoid this delay, but with the old version scattering its writes around, it was hard to use it correctly. Since the SQ version always writes to 32-byte, cache line sized blocks, it's easy to modify it to MOVCA.L to avoid the cache miss penalty. This change also allows the cache flush step can be eliminated with a OCBP or OCBWB instruction.

Another problem with writing to cache is cache thrashing. Reading from the Genesis's VRAM can force the cache line we are writing the texture to to be forced out, and writes to the texture can cause data we are going to read to also be unloaded, if the addresses happen to line up wrong. The SH4's cache has the "operand cache index" feature which can prevent some of this. It basically splits the 16 KB direct-mapped data cache into two software controlled 8 KB data caches that can't cause each other to thrash. I modified KOS to enable this.

With these changes on a DMA based version, the conversion time is only 0.58 ms, with a total time of about 0.93 ms after codebook and DMA overhead. It's still slower than using the SQs. I'm not sure where the 0.09 ms difference in the VRAM processing is. Maybe DRAM page misses caused by going between Genesis VRAM and the texture buffer?

I also tried pointing the SQ version to main RAM, then DMAing the results. It runs exactly as fast as the optimized DMA version.

As for using DMA for rendering… DMA slows basically slows down the a bit CPU while it's running. Whether or not it's worth it depends on the rendering code. Using DMA for sending polygons to the TA requires large main RAM buffers and increases input latency.

When working on my PVR driver, I did some (non-rigorous) testing of DMA vs SQs. On a test designed to be a worst case for SQs (very large polygons), switching to DMA made the CPU's T&L two to three times faster, since it wasn't waiting on the TA. But the gain would change depending on the size of the polygons, with the advantage getting smaller as the polygons got smaller. In another test, with my rendering code, drawing a game-like scene, I saw a 5% CPU slowdown using DMA over SQs. I didn't try a SQ best case situation. It probably wouldn't be as big of a difference as DMA best case, but still into double digit percentages.

So DMA TA submission seems to have better performance than SQs under worst case conditions, but under the right conditions (which may require more work to achieve) SQ can be faster, or at least the same speed, without the extra latency and main RAM buffers.

For Gens4All, drawing 8x8 quads, I highly doubt DMA would be faster than SQs. Each quad is so tiny (TA doesn't have to do much work writing pointers to the tile matrix) and there's a bit of delay in between each tile while it processes things like palettes and flipping and priority (TA has time to finish writing the vertex data and pointer), so I think it's unlikely to help here. The tile rendering function does some table look ups to figure out what texture and UVs to use, and DMA would probably slow this down.
MastaG wrote:TapamN, if you create a Patreon account I'd be happy to support you as well.
I don't think I could produce enough results on a constant enough interval for it to be fair for someone to give me money. I doubt there are enough people out there who would be interested anyways, so it probably wouldn't be worth the effort.
MastaG wrote:Why not spend some time forking the official kos repository to your own GitHub account and simply commit all of your changes and improvements so everyone can benefit.
The only notable change to my current KOS setup at the moment is OCINDEX support. Aside from my old modified version of KOS's PVR driver (which had problems), all the changes I made seem to be in KOS in some form now.


As for the new release...

The color rasters are enough to get water in Sonic games working correctly. The water in CV Bloodlines doesn't seem to work? There's still no scrolling rasters.

The SRAM saving is kind of in a "barely working" territory at the moment. The filename for the save is generated by hashing the product id of the ROM. At the moment, there's nothing in place to avoid hash collisions. The hash is pretty weak, too. Saves always go to and are read from the first VMU found. Saves don't have an icon.

It's possible for the VMU file to grow between saves. Gens calculates the size of the save file by looking at how much of the SRAM array is zeroes, and completely ignores the ROM header. If the game doesn't initialize the entire SRAM, Gens will think the save is smaller than it really is. For example, on Phantasy Star 4, saving only in the first slot results in a smaller file than saving in the second or third slots. So it's a good idea to have extra free space on the VMU when saving, just to be safe.

When saving in the in-game pause menu, if you don't have space, you can swap out or erase files on the VMU and try again. The save menu that comes up when exiting the game only gives you one try, so it's probably better to try to save from the menu.

Like vanilla Gens, EEPROM saves aren't supported at the moment (Wily Wars, MW4, Micro Machines).

Pressing up on the analog stick pauses the game and opens a menu. If you're using a controller without an analog stick (like an arcade stick) you can also open the menu with A+B+X+Y+Start.

You can change the controller settings here. "DC 4B to Gen 3B" is the same as previous releases. It now correctly configures the Genesis to treat it as a 3 button controller. "DC 4B to Gen 6B" is a generic 6 button controller mapping. The controls looks like this:
DC A -> Gen C
DC X -> Gen B
DC B -> Gen A
DC Y -> Gen Y
DC L -> Gen X
DC R -> Gen Z
DC Analog Down -> Gen Mode

There are two extra variants for The Lost Vikings (Maps A,B,C to X,A,B, also good for Ranger-X) and Street Fighter II (SNES style layout). If you have an arcade stick or third party 6 button controller, you can pick "DC 6B to Gen 6B" to directly map the face buttons between each system (Mode is on R button).

The same mode is used for both player 1 and player 2. There's no multitap support at the moment.

There are also options in the menu to save SRAM to VMU and reset the game. There's a second menu that allows changing rendering options. The first one controls how scrolling is simulated.

Cell Scroll does not try to simulate line scroll at all (like previous releases) and Line Tilt tries to simulate line scroll by tilting the tiles (like Smash Pack). The auto split options look through the line scroll table to try to figure out the best way to render it. If the line scroll is constant through the tile, it draws it normally. If there's one place in the tile that changes, it cuts the tile in two pieces and draws line scroll perfectly. If there's more than one change in line scroll, it uses a fallback approximation, controlled by the "div" parameter. "1 div" falls back to drawing the tile how Cell Scroll or Line Tilt would normally draw it. "2 div" divides the tile into two 8x4 tiles and does higher resolution cell scroll or tilting with them. The "4 div" options divides the tile into four 8x2 tiles, for even higher resolution line scroll. With 4 div, only one layer (A/B) can be divided, since the CPU and GPU usage is high.

The fastest options are Line Tilt and Cell Scroll without auto split. If the game has trouble maintaining 60 FPS, try switching to those. For Sonic 2's special stages, switch to "Auto Split (4 div A)", or else the pipe will be low resolution.

It's possible to disable the shadow/highlight emulation. Currently, only tilemap shadows work. Sprites cannot shade or highlight a pixel, although sprites are affected by tilemap shadows.

Two options in the pause menu enable or disable the performance graph and a raster display. The graph shows the timing of the SH4 CPU, CPU render time, GPU render time, and frame length. The raster display shows short lines on the left edge of the screen when certain types of raster effects are detected.

The option "Render at VBlank Start/End" controls when the screen is rendered relative to the Genesis's VDP timing. Some games display better/worse depending on the setting. Previous versions rendered at VBlank Start, but this version defaults to VBlank End because it has better synchronization with color rasters. If things aren't showing up, try changing this setting.

Due to how the emulator is set up, when the pause menu is displayed, the game is always rendering at VBlank Start, so toggling the setting doesn't update the screen, and the game might appear differently paused than how the game looks while playing if set to VBlank End.

There's currently no way to save controller and render settings to the VMU, but they are preserved when switching between games.

A prebuilt ELF executable is included with the source. I left out the .o files this time.

Edit: Attached file removed because of bug. Use the attachment from this post.
Last edited by TapamN on Mon Sep 20, 2021 2:14 am, edited 1 time in total.

User avatar
Ian Micheal
Developer
Posts: 6035
Location: USA
Contact:

Re: Gens4All with Z80 Emulation

Post by Ian Micheal »

This is brilliant TapamN thank you for your hardware and explaining it :) Yes there is point to working on this and thank you i was not supporting any raster effects cut down dreamneo cd supports no raster effects or it be very slow as well..


Thanks again you explained it very well thanks again..

Top props for this version.


TapamN could you help with GLDC There is bottle neck

I'm sure your SQ's way would really help

https://gitlab.com/simulant/GLdc
Attachments
output.png

TapamN
letterbomb
Posts: 150

Re: Gens4All with Z80 Emulation

Post by TapamN »

I just noticed a bug in the in-game menu. If you try to save from it, it softlocks on the save screen. I made a last minute change to how the menu works and didn't test it fully. The attached file has the bug fixed.
Ian Micheal wrote:TapamN could you help with GLDC There is bottle neck

I'm sure your SQ's way would really help

https://gitlab.com/simulant/GLdc
I can try looking into it at some point. Is there a benchmark program that can be tested? Is it polymark/quadmark from the samples directory?
Attachments
gens4all.7z
(909.44 KiB) Downloaded 204 times