Thank you for your interest. We trust our product very much and we thought every detail for WLSDM.
Here our comments;
First of all, your domain is really big and massive. But, we it is for sure we can handle it.
1 - For my experience what make the difference is collecting data… In order to monitoring historic information anda data (this help me in troubleshooting)
Yes, especially this size of domain is very hard to monitor. If you have any unused server, datasource, JMS definion please delete all, after that fresh resinstall WLSDM and do not track unused apps, servers, deployments… etc.
Historical data would be very useful for your problem troubleshooting. Especially, check notification pages.
2 - About deployments table your dashboard use to list applciation status in evey servers of my cluster and it is a little bit redundat
We have asynchronous backend health feature on health dashboard. Could you please enable it and let us know the result? (Go to Health Dashboard page, > Page Operations Menu> Switch the below button to ON)
By the way, we have JMX/t3 timeout on system.WLSDM tab at System Settings page. If your domain is not responsive has its own problem, we are cutting the request because of keeping WebLogic domain healthy. It’s default value is about 5sec. That’s why you can see the health status conflicts.