Optimizing FPS?

Started by Xeneonic, June 09, 2014, 05:25:42 PM

Previous topic - Next topic

Xeneonic

Greetings,

As you may notice from a certain screenshot here (0.3 seconds per real life seconds on 1x speed), maps like these require a lot of buildings in order to survive, but really cripple performance. I'm running on a $2500 desktop system (~6 months old) but I barely notice a performance increase to say, a $1600 laptop I bought 3 1/2 years ago.

The laptop is dual core and the PC is quad-core, assuming no multi-threading takes place in this game, a 3 year newer processor (on top of it not being mobile) on 4.6GHz overclock should still provide at least double the performance. I do assume the game is mainly CPU intensive, yet I am hoping for ways to shift load more towards the GPU. Any tips are welcome. I do assume things like direct draw wrappers etc wouldn't help, this being a Unity-based game.

In comparison, my girlfriend plays Sims 3 with absolutely all expansions and addons. Simply loading the town on the laptop takes about 10 minutes (Sims 3 is notorious for its slow loading times), yet takes less than two minutes on this PC. This should give a bit of an indication that the PC is quite far ahead compared to the laptop, but for CW3 this doesn't seem to make much difference. (Sure there's some, but maybe a 30% improvement at best, 15% at worst)

Edit: Did some investigation, it seems Draw Call Batching would solve pretty much the whole issue completely if this isn't done yet (Or the FPS loss must come from something else, layer upon layer of Creeper and anti-creeper battling against each other?)

Would be really nice to have some feedback on this xD

Edit2: Perhaps having an option to disable (anti-)creeper transparency would help as well?

Karsten75

CW3 is a GPU-based game.  Al the creeper simulation and flow is graphics-based. What graphics devices do you have in your computers you referenced?

Also, if you hit "\" (backslash) in-game you should get a little display giving you FPS. You can then also look at Task Manager to give SPU utilization results.

I know that early in the development of CW3 I tried paying on an Intel Core i5-2500K with only integrated graphics - that didn't last long and things improved greatly when I got a discrete GPU.

knucracker

The game already does extensive draw call batching... without it, there would be thousands of draw calls per frame.  The Creeper, even on the largest map, will produce around 16 draw calls.  On smaller maps, less...  Units and ammo get batched, as do packets and a million other little things (effects, etc.)  The in-game GUI is also heavily optiomized for lowering draw calls.  But all this together still only limits draw calls to 100 or so.

Performance has a lot to do with the maps size, what is put on the map (CRPL and other), and what you build as a player.  For instance if you create a giant massively connected graph of reactors, then every packet has a lot more work to do to find the shortest path to a target.  Energy balancing also has to be done across all of those reactors, etc.  Many of the computation problems scale linearly, or n(log(n)), or worst case n^2.  That is one reason why when you double computing 'performance' you don't always see double game speed.  The game is also memory heavy in terms of memory access.  The slower you memory, the worse the game will perform.  This is because the creeper data structures get looked at on a large scale every frame.  A 256x256 map is a 65536 element array that gets looked at 30 times per second (nearly 2 million inspections) times looking at neighboring cells of 2, and a few other checks.  That results in roughly 10 million int array accesses per second, and that's before any shots are fired (that's just for creeper calculations).

Now, on the other hand, I've never played a mission where I had to build so much stuff that I had a massive slow down.  It's just not my style to overbuild.  That said, when the game was designed I didn't want to enforce limited units.  I knew I opened the door to complaints of performance issues, but I decided to "caveat emptor" when it came to building units.  If you want to build a massive army, have the iron to devote to it, and/or don't care about performance then have at it.  On the other hand, most missions (always excepting for some custom maps) don't require you to build hundreds and hundreds of units.

Xeneonic

Quote from: Karsten75 on June 09, 2014, 05:37:04 PM
I know that early in the development of CW3 I tried paying on an Intel Core i5-2500K with only integrated graphics - that didn't last long and things improved greatly when I got a discrete GPU.
Laptop: NVIDIA GeForce GTX 460M
Desktop: AMD Radeon R9 290 (With a 10/15% overclock on core/memory respectively)

The Desktop card runs at a 2560x1600 resolution, the laptop at 1920x1080. My core game is World of Warcraft, in intense fights (Raid bosses 25-man is my to-go matchup). The desktop pulls 60-70 FPS everything on maximum (8x AA. Exception: Shadows High, not Ultra), laptop reaches 30 FPS on medium (0xAA) in same fights , this is in their respective resolutions.



Quote from: virgilw on June 09, 2014, 06:43:15 PM
For instance if you create a giant massively connected graph of reactors, then every packet has a lot more work to do to find the shortest path to a target.
I'm not a programmer, so I apologize in advance for the dumb advice;

Would it be possible to do the check once from source to destination taking only the fewest "hops" and ignore all other checks until a new path is made available or current one is destroyed? (I'm talking from a computer network specialist perspective here, with how switches, routers etc work on a network)

Quote from: virgilw on June 09, 2014, 06:43:15 PM
Energy balancing also has to be done across all of those reactors, etc.
So I guess it's not feasible to just add the generation of each, add them into a sum of a total, then keep it that until a change occurs in energy generation to make a new calculation?

Quote from: virgilw on June 09, 2014, 06:43:15 PM
Now, on the other hand, I've never played a mission where I had to build so much stuff that I had a massive slow down.  It's just not my style to overbuild.

I do concur, with some maps however spores keep flying towards you and you need the energy generation to keep up with the output of the beams. Of course the terps were massive overkill, but I added that for screenshot enjoyment.



Taking everything into consideration, this does seem to be one of the main reasons (other than fun factor/gameplay) that some similar games (In terms of resource usage) such as Supreme Commander use a tech system where you can upgrade the individual buildings (Mass Extractors) to output more. Your Forge does help tremendously already but for maps like these we'd need the cap to be much higher for less reactors to be viable for survival. That should theoretically also immediately solve any FPS issues from creating too many reactors, and paths for packets.


Thanks for the replies, it's certainly some food for thought :)

Xeneonic

#4
Quite a bit more analyzing:

1. CPU core seems to be the limiting factor. Setting affinity to one core for CW3 (Dedicated only for CW3) it goes up to immediately 100%. Assigning another core (So now it runs on Core 6 + 7, dedicated again) shows usage of about ~110-120% (Usually one core 93-100% and the other at 10-20%), this seems to reliably increase FPS by 10-20%. (Going from 10-11 FPS to 12-14, swapped back and forth about 15 times, reloaded the save to generate reliable results). Enabling more cores for CW3 does not have an impact. This is most likely because setting affinity to two cores or more on non-multithread games just juggles the load, and perhaps what you do gain is an overhead distribution.

So, as with most games, CW3 scales bad with multi-threading. I would also assume that since my CPU is the bottleneck that my memory is having no trouble at all (Tests also indicate this, memory doesn't seem to get anywhere near the maximum usage (Not talking storage of course, activity).

Apparently Unity allows multithreading just fine and techmage made a very awesome post in there you might want to check out. I have quite limited programming experience however so perhaps this is all bread and butter for you or just not applicable for a game like CW3. I do recommend glancing it over at least ^^

I also wonder if there's any load that can be shared more onto the GPU, but I would assume it'd be pretty niche if any.

Karsten75

Quote from: Xeneonic on June 09, 2014, 05:25:42 PM

Edit2: Perhaps having an option to disable (anti-)creeper transparency would help as well?

Have you checked the Visibility options? You can turn off soylent (green collector circles) and creeper rendering.

ANd just what map are you having these issues on? In case you didn't get it from Virgil, the entire game has been highly optimized and htere are very few map  that are CPU intensive.  For most maps and most playstules, performance is not an issue!

knucracker

Multithreading isn't as much of a technical platform issue as it is a problem domain issue.  Some problems can't be run in parallel.  Some can, but only at great synchronization expense. 

Funny story... a long time ago when I was working as a software developer for the equivalent of Initech ( see http://en.wikipedia.org/wiki/Office_Space ), multi processor systems started to become popular.  Product managers of course naturally assumed that a dual processor machine would run our software twice as fast.  At the time I was fond of saying, "Next time you need to go to the bathroom, wipe your butt in parallel with pooping.  You'll get done twice as fast." (I might have used more colorful vernacular at the time :) )

In CW3, each game frame has to do a large sequence of things and that sequence must be repeatable and deterministic.  This is part of the game design where frame by frame super determinism is desired.  Now, I might have been able to split up some of the processes to run in parallel within a frame.  I'd probably be wrong to say it couldn't be done.  But the stability of the system, the synchronization overhead, and the increased development time would all have been things to contend with.

arandomhalo

Reactors have calculations?  I thought that the energy they made was instantly available at the CN.  From what you're saying, it sounds like the packets work the same way for generation vs consumption, only you can't see them in the generation phase.

On a more general note, I'm often interested in game mechanics like this.  It seems these discussions get sprinkled across the forums.  Is there a wiki page or forum thread that pulls some of these topics together?  For example, how packets are requested, whether there is a one-frame penalty for passing through a node like there was in CW1, how a terp works on a cell if it's low on energy, how digitalis grows, how demand gets split up between CNs, stuff like that.  People do really interesting analyses on soylent optimization or creeper destruction per unit energy. 

4xC

No wonder this game is bound to never reach iOS. There are just too many factors and variable to account for.
C,C,C,C

Asbestos

Bah! You guys just need to be more patient. My terrible computer runs CW3 at 5-6 FPS usually.

Xeneonic

Quote from: 4xC on June 12, 2014, 11:52:43 AM
No wonder this game is bound to never reach iOS. There are just too many factors and variable to account for.

I am quite confident you could optimize a whole lot more. Program the network in the game like how it goes regarding computer networks IRL would cut severely in usage when you stack up reactors. On top of that you could basically optimize it so much that the game makes, so to speak, a megatexture on where you place the reactors so that it's 100% blended in like a normal texture on the map which would then, assuming full optimization with the network updates, you could place the whole map full of reactors without any performance loss at all:

Think of it like this: A mortar leaves a crater on the ground, this requires the game to place a visual effect on the floor. Mortar shoots again, but with a 0.0001% error margin (This is an exaggeration but used for example), so now the game creates a second visual effect on the floor. If you keep repeating this, eventually you'd have thousands of different textures (Whether its near each other or not is irrelevant for this example). Now imagine the game making a "screenshot" of the floor including the 1000's of textures and reload the whole map but instead of loading the map layout and add the 1000's of textures, just place the "screenshot" (Megatexture). Now basically you have 0 performance loss. There weakness of this system is that it would be pretty much impossible to remove those crater visuals now IF they overlap. Thankfully this is never the case with buildings so this weakness would not affect reactors. In the memory you could still say "@240x240 = reactor" and you could have collision detection for creeper and whatnot. You could still have it selectable, but instead of clicking on the reactor you'd just click in an empty hitbox over it (Visually no different for you, but very much so for the game).

The single problem is that this would require a game overhaul. But I do think it is possible. A lot of work involved. Virtual texturing is also a possibility but unity doesn't support that I believe. Coding that in C++ on your own might be nightmarish :(

Karsten75

Quote from: Xeneonic on June 13, 2014, 06:14:11 AM
I am quite confident you could optimize a whole lot more.

I'd love to play any game you have created. Can you please link me? Or are you merely conforming to the old saying that  "the best oarsman stands ashore?"

I've seen a large number of people come on here and try and create parts of the game. One guy even tried to create some of it in 3D. it seems easy and then eventually they disappear, never to be heard of again.

knucracker

A couple people have asked about reactors, so I'll bite :)

Reactors and Collectors simply add to energy each game frame.  Easy enough, right?  Well, collectors add based on 'their' green area.  But of course collectors can overlap their green area.  So energy production is more a function for total green area than collectors.  That's one problem that has to be solved. 

Now, say you have energy production based on green area rather than collectors solved.  Next problem is, 'what' green area(s).  There could be 1 giant connected green area or there could be 50 small areas.  So pass on that and think about reactors.  Reactors are simple, they just add a fixed number of energy per game frame (excepting for upgrades, assuming the forge is online and powered).  But there can be 3 Command Nodes on a map.  Reactors only provide power to the Command Nodes on their network.  So a reactor has to determine what CN(s) to give it's energy to.  Command Nodes have finite energy storage, so after you have implemented an algorithm to determine which CN to give energy to, you have to make sure you don't overflow max storage.  Of course the CN's on the same network balance their energy per game frame (like two capacitors in series).  Also,  the max storage is for storage across game frames, not a limit on production during a game frame.  Say you have 1000 energy production from reactors and want to generate 1000 packets in that same game frame.  But you only have 100 storage in a CN.  This scenario should work, so you can't just clip reactor production per game frame at 100.

Then there is packet production.  Within a given game frame N packets could be produced and they will be produced by the 'nearest' CN that should produce them.  But that CN might deplete it's local energy storage very quickly.  So it has to pull energy from other connected CN's on the same network within a given game frame, or energy deduction has to happen in a distributed fashion with a little bit coming from each CN on the same network.  That has to be meshed with above reactor production system.

This is not impossible to solve, any game with 2 or more CNs and reactors solves these problems.  The game even has provisions for energy guppies that both store and release energy (each complicates the algorithm because they act like little Command Nodes for energy).  When there is only one CN, the problem is easier to solve than when there are 2 or more.  The game knows this and has two different solutions to the problem so it runs more optimally when there is only one CN.  Now I'm not saying all of this to toot any horns, or be glib, or dismissive, or anything like that.  I also won't claim to have optimized everything, or to have a perfect optimal solution where I have implemented optimizations. 

On the contrary, I thought (and think) a lot about optimizations in almost everything I do.  I kinda have to since my games tend to be 'algorithmic' and code heavy. It's always interesting (to me anyway) how some relatively simple ideas and requirements can turn into things that produce lots of subtle implementation difficulty.  For one of the most common things people take for granted that turns out to be a really interesting problem, see this:


Relli

Quote from: virgilw on June 13, 2014, 09:24:46 AM
It's always interesting (to me anyway) how some relatively simple ideas and requirements can turn into things that produce lots of subtle implementation difficulty.

I've noticed the same thing while trying to make my own map here.
I wanted to make a unit that pretends it's terrain. It just sits there and makes the creeper have to climb higher to get over it (or go around). And so I added the all-important line of code to it:

CurrentCoords dup2 GetTerrain 5 add SetTerrainOverride

Which finds the height I'm sitting on, adds 5 to it, and then sets the override to that value.
And then I realized...This only counts for the middle cell. There are 9 I need to all do the same thing. So I changed it to a do loop that does that. Problem sol...wait, what happens when this building is destroyed? The override would still be there, like a ghost unit. So I set up a second loop to undo the override on boom. I kept adding, and kept adding, whenever some little problem presented itself. And now the full code for this unit takes up an entire page. All from one little idea, and one little line of code. And I'm not even done with it yet.

Xeneonic

Quote from: Karsten75 on June 13, 2014, 08:10:28 AM
Quote from: Xeneonic on June 13, 2014, 06:14:11 AM
I am quite confident you could optimize a whole lot more.
I'd love to play any game you have created. Can you please link me? Or are you merely conforming to the old saying that  "the best oarsman stands ashore?"

I've seen a large number of people come on here and try and create parts of the game. One guy even tried to create some of it in 3D. it seems easy and then eventually they disappear, never to be heard of again.
I understand your defensive position towards virgilw. My post was in no way meant as criticism towards him personally or any of the games he created. I enjoy all the CW's tremendously and his next one will be an instant pre-order, even if it'd be a $60 game.

I had perfectionistic views on my previous post that I mentioned would be undoable for a one-developer game or is at least not worth the time investment. Neither have I said they're easy to pull off. All I wanted to imply was that it is not impossible to have an iOS version of CW3.

Quote from: virgilw on June 13, 2014, 09:24:46 AM
[omitted] ... For one of the most common things people take for granted that turns out to be a really interesting problem, see this:

I really loved that video, we don't use automatic transmission in our country (at least 99% of the cars here are manual), but it was a very interesting video nonetheless. Regarding the reactors, I like to bring up the part of "Reactors give +200, but you only have 100 maximum, it should still work fine (100% efficiency) if your usage is more than 100". I'm not sure how TA or Supcom handles this part, it's a bit easier to pull off there because there's no packets involved I would assume. You could have 0 storage, get 999999 income and 999999 usage with no problems and the game would just say "income 0", if you'd have storage, and your income is higher than usage, your storage bar would always remain at exactly 100%, it wouldn't "tick off" the deficit and then add the income a frame later like in CW.

Things like that I guess makes it much easier when not working with packets, and having different CN's/guppies would indeed complicate things much more. Then again, the system there works with energy and mass, you could compare it with a CW3 ore mine that consumes energy and gives ore, if there's no energy you get no ore. Although I do suppose an analogy for totems would be even better now I think of it.

All in all, it is good food for thought. Have you experimented with no packets for energy and just use collector/relay links as "cables"? I wonder how much difference it would make or gameplay features you could implement. Cables further away from the source (CN/guppy) wouldn't transfer energy as efficient etc. Ah, the possibilities.