







1. Optimization



1.1 Emulate HMD performance



  • Launch witch -emulatestereo
  • Set resolution 2160x1200
  • Set r.screenpercentage 140


  1. Editor Preferences > Play > Play in Standalone Game > Additional Launch Parameters中填入 -emulatestereo
  2. Start with Standalone Mode and Set Resolution to r.SetRes 2160x1200 or r.SetRes 2160x1200f
  3. r.screenpercentage 140

1.2 Ready Profiling

  • Play in Standalone
  • Make sure the Editor is set to NOT update in realtime
  • Minimize the Editor
  • Make sure to turn off Frame Rate Smoothing[Project Settings]
  • Turn off VSync[r.vsync 0]

使用r.ScreenPercentage 10命令,如果程序突然加速,说明性能瓶颈在GPU上。


关于开始优化前的准备问题,流程可以参考一下[CEDEC2017] UE4プロファイリングツール総おさらい(グラフィクス編) 里面的内容。每次最好快速浏览一下。

然后是在准备Profile一定要先烘焙光照,要不然我真不知道在优化个什么鬼。因为在这个地方踩坑了。有位大佬在项目工程里放了一个范围超级大的Lightmass Importance Volume 导致UE4在烘焙的时候总是让Swarm一直卡在ExportScene阶段。(这里参考了Why is Swarm taking forever to export scene?)

1.3 VR Instanced Stereo

Edit > Project Settings > Rendering > VR


目的是为了让双眼的画面同时渲染。减少Draw Call。


1.4 Rendering Pipeline


至于流程的话参考了CEDEC2016 Unreal Engine 4 のレンダリングフロー総おさらい这篇文章。



官方网站的GPU Profiling

1.4.1 Z PrePass

PrePass DDM_AllOpaque(Force by DBuffer)

这个PrePass是不是上面的Z PrePass还有待确认,因为我是在GPUProfiler中看到的这个,也许是Z PrePass的一部分。

我看到的不是单单的PrePass,而是PrePass DDM_AllOpaque(Forced by DBuffer)






PrePass DDM_AllOpaque是什么?

在这篇文章里有简短的提到[UE4]GPU Visualizer (GPU Profiling) Specification:

PrePass DDM_…


  • Early rendering of depth(Z) from non-translucent meshed.
  • Required by DBuffer decals.
  • May be used by occlusion culling.
  • Engine -> Rendering -> Optimizations -> Early Z-pass.

Cost affected by

  • Triangle count of opaque objects.
  • Depending on Early Z settings: Overdraw and complexity of Masked materials.


Forced by DBuffer是什么?

英文意思很好理解,但是DBuffer是个什么我总是找不到。最后还是在上面(或者下面?)提到的那个神一样的SlideShare的介绍文章[CEDEC 2016]UE4を扱うアーティストがつまづき易いポイントはここだ。Epic Gamesが解説する注意点と回避法中找到了类似的定义:

DBuffer DecalsまたはDeferred Decalとは,デカールをG-Bufferとは別の,「D-Buffer」と呼ばれるデカール専用特殊バッファに描画しておき,G-Bufferを用いてのライティングやシェーディングは,D-Bufferの内容も反映しながら行うという流れになる。デカールの描画結果がG-Bufferに統合されてしまうと,これが原因で前出のような問題が出てしまう。これを回避するために専用バッファを用意したというわけだ。

 負荷は高くなるが正確なデカール表現が行えるDBuffer Decalsを導入すべきなのか。あるいは別の表現を選んでDBuffer Decalを避けるかは,よく検討して決めたほうがいいようだ。

具体的意思要专门展开关于DBuffer Decal的文章,总而言之就是当你决定使用DBuffer Decals的时候,并在(Project Setting/Rendering/Lighting中)开启了DBuffer Decals功能时候,那么上面的PrePass DDM_AllOpaque就会被强制执行。由于某种原因(文章中表示是Pre-Lighting阶段的描画需要)后续在别的文章中展开。

上面关于PrePass DDM_…的内容貌似也在UE4 Graphics Profiling: All Categories Guide (Rendering Passes)中有提及,这是是我之前就收藏过的,但是我并没有很仔细的看。。。

关于Deferred Decal的讨论,本来想要在其它章节写,但是暂时先整理一下, [UE4] Deferred Decal



关于UE4中的Unreal Engine 4 Console Variables and Commands

Name Help
r.EarlyZPass Whether to use a depth only pass to initialize Z culling for the base pass. Cannot be changed at runtime.
Note: also look at r.EarlyZPassMovable
0: off
1: only if not masked, and only if large on the screen
2: all opaque (including masked)
x: use built in heuristic (default is 3)
r.EarlyZPassMovable Whether to render movable objects into the depth only pass. Movable objects are typically not good occluders so this defaults to off.
Note: also look at r.EarlyZPass
r.DBuffer Enables DBuffer decal material blend modes.
DBuffer decals are rendered before the base pass, allowing them to affect static lighting and skylighting correctly.
When enabled, a full prepass will be forced which adds CPU / GPU cost. Several texture lookups will be done in the base pass to fetch the decal properties, which adds pixel work.
0: off
1: on (default)

这里插播一条Decal performance question

This is from the docs:


The mesh complexity of the objects affected by the decal is not affecting the performance. The decal performance depends on the shader complexity and the shader box size on the screen.

We can further improve the per decal performance. Ideally the bounding box of the decal is small to get better per pixel performance. This can be done manually. An automated method is possible but a good designer can also adjust the placement to improve performance further.

Current limitations

We currently only support deferred decals and they only work on static objects. Normal blending is currently not wrapping around the object. Mip map computation is not done yet so you might see 2x2 block artifacts on object borders. Streaming is not yet hooked up so make sure the texture is not streamed. Masking decals (not affecting other object) is not fully implemented.

1.4.3 Base Pass

关于base pass的总结:


    Responsible for

    • Rendering final attributes of Opaque or Masked materials to the G-Buffer
    • Reading static lighting and saving it to G-Buffer
    • Applying DBuffer decals
    • Applying fog
    • Calculating final velocity (from packed 3D velocity)
    • In forward renderer: dynamic lighting

    Cost affected for

    • Rendering resolution
    • Shader complexity
    • Number of objects
    • Number of decals
    • Triangle count

在basepass阶段做了许多工作,其中Shader Complexity 是影响性能的一个很重要的因素。 Optimization


  • Shader Complexity: 在view-mode下可以查看shader的复杂度。
  • Stat: 在Material Editor里面有stat window查看pass的数量
  • Rendering Resolution: 可以查看和影响G-Buffer和其他贴图的质量。
    • stat RHI: 查看G-Buffer的内存占用

英文文章读起来不如中文的快,那么容易理解,但是有些话翻译成中文的话不知道为什么就变了味道。还是多读几遍人家的文章吧。 GPU Visualizer


BasePass 0 = Opaque Meshes.

BasePass 1 = Alpha Masked Opaque Meshes for Z-depth.

BasePass Dynamic = Animated Vertices such as Skeletal, GeoCache(Alembic),etc.

在上面我只看到了Dynamic,而其他的是Static EBassPassDrawListType=0Static EBasePassDrawListType=1意思是不是一样的我也不确定。


// Source UnrealEngine4源码:Runtime\Render\Private\BasePassRendering.cpp
EBasePassDrawListType DrawType = EBasePass_Default;

// The definition of the type
enum EBasePassDrawListType
EBasePass_Default = 0,



  • View0
  • View1



1.4.4 Pre-Lighting

1.4.5 Lighting

Unwrapping UVs for Lightmaps GPU Visualizer

这个是在ProfileGPU命令中出现的GPU Visualizer视图中的光照部分GPU消耗情况。

这里需要澄清的是,我按照渲染流程顺序题的标题,但是这个GPU Visualizer中的内容未必就是这个阶段做的事情,也许的确是这个阶段做的,但是我不知道。我只是看见名字相同,就整理到一起罢了。


  • Lights
    • DirectLighting
      • NonShadowLightings
      • IndirectLighting
      • ShadowLights


ShadowLights -> … -> ShadowProjectionOnOpaque



1.4.6 Reflection

Reflection Environment

Planar Reflection

1.4.7 Translucency


Separate Translucency


Project Settings > Engine > Rendering > Translucency


  • r.SeparateTranslucencyScreenPercentage XX: 指定该buffer的解像度
  • r.SeparateTranslucencyAutoDownsample: 自动降低解像度

1.4.8 PostProcess

1.4.9 Shadow


Fake Shadow


Capsule Shadow

  • SkeletalMesh: Capsule Direct(Indirect) Shadow 等等

1.5 Command Introduction


stat SceneRendering


关于这个参数,我也不太完全确定,在网上搜了也没有什么具体的答案。搜来搜去看到了OcclusionCulling感觉很像,但是按照这个方向去优化了一下试了试并不是很理想,貌似不是并不是一个概念,而且OcclusionCulling这个参数在另外一个命令,貌似是stat InitView中有貌似。




2. Bluerint优化


Tick Event

如果不需要的话,将Actor中的Tick Enable 设置为Off。亦或者调整Tick Interval的数值减少每一帧的调用。




Nativizing Blueprint


3. Landscape Optimization







  1. Navigation不用的话关掉
  2. Collision不用的话关掉
  3. LOD设置调低

4. HTC Vive



  1. 这里就有疑问了,如果游戏的帧率高于显示器(VR)的刷新率会怎么样。



V-Sync被用来解决这个问题,垂直同步(Vertical Synchronization)通过建立一个不让在显示器刷新前将后备缓冲中的画面拷贝到显示缓冲中的规定来解决这个问(有条件的双倍缓冲)。如果FPS高于刷新率的话没有问题,后备缓冲的更新完成后,系统处于等待状态。当显示器刷新后,后备缓存拷入显示缓存,显卡则可以在后备缓存中描画新的画面。

  1. 游戏的帧率低于显示器(VR)的刷新率会怎样



那么进入正题,UE4中的Smoothed Frame Rate Range有什么用。

Smoothed Frame Rate Range

With Frame Rate Smoothing, the application is determining what range is acceptable for frame rate wandering,so you can cap your frame rate to between Min and Max allowable frame rates.Since this is application based,it will make these changes before any hardware vsync changes.







5. UE4中的一些概念

Instanced Static Mesh



  • 减少了Draw Call

据我的理解:一个Actor的渲染对于CPU来说就得生成一个draw call,所以庞大的Actor的数量会拖CPU的后腿,减少draw call是优化性能的方向之一。

但是与此同时,一个draw call的数据不充分就导致GPU做额外的工作。也就是我遇见的InstancedStaticMesh这个东西产生的影响。



  • Frustum Culling
  • Occlusion Culling



  1. ToggleDebugCamera: 命令行打开Debug摄像机,找到想要看的位置
  2. FreezeRendering:


One thing to know about instanced static meshes is that if any part of the mesh is rendered, the whole of the collection is rendered. This wastes potential throughput if any part is drawn off camera. It’s recommended to keep a single set of instanced meshes in a smaller area; for example, a pile of stone or trash bags, a stack of boxes, and distant modular buildings.



  • r.VisualOccludedPrimitives










  • CameraRotaionThreshold(Default 45.0)
  • CameraTranslationThreshold(Default 1000)

另外,r.AllowOcclusionQueries 的ON/OFF 可以手动切换。


  • r.ExpandAllOcclusionTestedBBoxedAmount


  • r.ExpandNewlyOcclusionTestedBBoxsAmount(Default=0.0f)


  • stat SceneRendering
  • stat InitView
  • Stat SceneUpdate

命令应该别的地方有讲过。但是这里我注意到的是在stat InitView命令里,有一个处理占了我很多时间

Render Query Result


RenderQuery Result is when the render thread stalls waiting for the GPU to finish the Occlusion Query, and return the results to the render thread, so that it knows what to render.

At the same time, the game thread is stalled waiting for the render thread.

This can be turned on or off with the console command


0 - off 1 - on


什么是Occlusion Query?




Culling Distance

  1. Foliage Culling Distance
  2. Culling Distance Volumn

UE4 Performance and Profiling




1.CPU/GPU Profiler


2.Profiling in a Build


  1. Minimize the noise that can interface with profiling

    • Turn off everything you are not using
    • Turn off v-sync r.vsync 0
  2. Turn off Framerate Smoothing

  3. Make a Test build

    • Testing in a Development build inflates the Draw thread with noise

尽可能的关闭噪声(noise),前两项是必须要做的,但是第三项,我也不太清楚我理解对不对,开发的时候使用的是Development mode,所以尽可能的减少噪音就直接build工程(即Shipping mode)来optimization。


3.Profiling from within the Editor



  1. Play in Standalone
  2. Make sure the Editor is set to NOT update i realtime
  3. Minimize the Editor

    • VR > Editor Preference > Play > should minimize Editor on VRPIE
  4. Make sure to turn off Frame Rate Smoothing

  5. Turn off VSync

4.General Process

  1. Identify the bottleneck

    • Game Thread
    • Render Thread

      • CPU
      • GPU
    • Often jumps back and forth as you optimize

    • Use r.ScreenPercentage 10 to quickly check if you are GPU bound


Game Thread

  • Code or Blueprint

CPU Render

  • Object count,draw calls,culling

GPU Render

  • Shaders, overdraw,light

5.Measuring in Milliseconds

  • Use stat unit,not just stat fps

    • Largest number shows you the likely bottleneck
  • Milliseconds per frame

    • Frame: total time to finish each frame
    • Game: C++ or BP gameplay operations
    • Draw: CPU render time
    • GPU: GPU render time
  • You can also use stat unitGraph,whitch shows a line graph playback.
    • Mostly useful for spotting repeating hitches



  • Mostly useful to measure problems unrelated to Game Thread
  • Use stat unit to show milliseconds
  • Use r.ScreenPercentage 10
    • Or any number smaller than 100
    • Reduces number od pixels sent to the GPU
    • If things get faster,you were GPU bound
    • If they dont get faster,you were CPU bound

6.Show Flags

One of the simplest ways to look for problems is to turn off partsof your scene.

Helps know when to look into reducing

  • LODs
  • Less translucency
  • Adjust lighting

show assetType or showFlag.assetType 0-1

  • Staticmeshes
  • Skeletalmeshes
  • Particles
  • Lighting
  • Transluncency
  • Reflectionenvironment
  • Many more listed in docs

7.Diagnostic Tools-Realtime stats and view modes

Stat commands

  • stat fps
  • stat unit
  • stat scenerendering
  • stat gpu
  • stat engine
  • stat streaming
  • stat emitters
  • stat lighting

Stat SceneRendering

  • Only place to see draw calls

    • Draw call is a single request to GPU to draw something
    • Prime candidate for CPU slowdown on lower-end machines and also on mobile(less of a concern with Metal and Vulkan)
  • Also good palce to see time for:

    • Shadows
    • Decals
    • Post Processing
    • Lighting


Stat GPU

  • Relatively new 4.15
  • Realtime readout from GPU
  • Gives highlights, but not details

    • Makes i very good to quickly target trouble spots
  • Use the full GPU profiler if you want to target individual things

    • Example:if you want to find specific lights that are casting shadows

Optimization View Modes

Shader Complexity

  • Show how much your shhaders are costing on the GPU
  • Good way to see overdraw issues

    • Overdraw is when a pixel must be drawn multiple times
    • One of the most common content issues for optimization
  • Graph at the bottom shows where the pixel and vertex shaders are in terms of performance

  • If you see a lot of red and white,reconsider your approach


Shader Complixty Mode is used to visualize the number of shader instructions being used t calculate each pixel ofour scene.It i a generally a good indicating of how performance-friendly your scene will be. In general, it is used to test overall performance for your base scene, as well as to optimize particle effects, which tend to cause performance spikes with a large amount of overdraw for a short period of time.

Quad Overdraw

  • Helps show how you are using your polygon count on the screen
  • Can help show where meshes should be LOD-ed down

    • Too much green shows areas that should be simplified
    • Anything more than green is starting to get costly, commonly translucency overdraw
  • Very useful for MSAA on Forward Rendering,as the number of poly edges dramatically affects performance

Quad Overdraw in-depth

  • Your GPU breaks the view up into quads

    • 2*2 groups of pixels
    • This is more efficient than performing all operations on all pixels
  • Very small, or very long, thin geometry wastes pixels

    • Regular, large polygons make the best use of pixel quads, best use of GPU
  • Model with regular trangles and LDD aggressively

When you are looking at an opaque object on the deferred render, and you see a lot of green ,that means all 4 pixels of that quad had to be recalculated over and over.

you should probably be using lower LODs.


Shader Complexity + Quad Overdraw

  • Combines two powerful view modes into one
  • USeful to get an idea of expensive shader anf geometry at a glance
  • You will still need the individual settings to help diagnose specifics

Liht Complexity

  • Visualizes the cost of scene lighting
  • As lights overlap, the colors shift from cool to warm to white
  • Only shows cost of lighting, not shadowing
  • Obviously, white is bad
  • Great way to see where you should be lowing light radii
  • By flipping this on and off, you can quickly see if the cost of any given light is “worth it”

Lightmap Density

  • Shows the density of texels for lightmap purposes
  • Color shifts from cool to warm an density increases

    • Most things can be blue
    • Shadow maps don’t often need to be very high res
  • Keep this as low as possible

    • Cost adds up quickly

Stationary Light Overlap

  • Only a maxium of 4 stationary lights can affect any given object
  • Beyound that,any other lights fall back to Movable(fully dynamic)
  • This view mode helps track down where that might be happening
  • Reminder to keep lighth radii as small as you can get it
  • Do you have a stationary sun?
    • Congratulations! That’s one of the four lights!

LOD Coloration

  • This mode shows the current mesh LODs in use by color coding them
  • Very fast way to through ypur scene and verify that things are LODing when they should be
  • Interestingly, mode clearly shows that the trees are not LODing at all in this project
    • Was able to diagnose frame drop instantly using this mode

8.Profilling Tools

CPU Profiling

  • Integrated tool to take apart a segment od your gameplay and see wat’s happening on each tick

    • Very useful way to profile Blueprint performance
  • Measures a segment of time

    • Within that segment, can look a individual frames or averages
  • Requires two special Stat Commands

    • stat startfile & stat stopfile
    • Tese generate a log file between the interval of the commands
    • Profiller allows deep analysis that log
  • Step down into world tick and see individual Blueprint functions

  • Can be used for CPU(Game and Draw) and GPU

捕获下来的日志可以在UE4的Session Frontend中展开分析。

GPU Profiling

  • Three method to profile GPU functions

    • stat GPU command in tne viewport
    • Recorded file log in the Session Frontend
    • GPU Profiler
      • Can dump out to either the log or its own UI
  • Great way to visualize the cost of:

    • Base pass
    • Lighting
    • Shadows
    • Post processing

Tracking Slow Frame

  • stat dumpHitches

    • The command is used to dump any hitched over a given time in milliseconds out to the log
    • Use command t.hitchThreshhold 0.xx to set value (0.05 is default)
  • memReport -full

    • Full breakdown of how memory is being used
    • There’s a great blog post on how this works


startFPSChart and stopFPSChart

  • You can use the commands startFPSChart/span> and stopFPSChart/span> to create a diagram of framerates over time

  • You can call these at start and end of a Level Sequence to automatically read out the frame rates along a given course, as defined by a cinematic


9.Blueprint Optimizations - Or:Keeping the Kids from Eating the Crayons

  • Blueprints make it easy for folks to assemble gamepaly logic
  • Best results often come with engineer mentorship
  • Common challenge
    • Reliance on Tick functionlity
    • Over-use of expensive functions(iterating on many objects)
    • Abuse of hard reference

Reliance on Ticking Blueprints

  • Tick means should on every frame
  • Blueprints should almost never need Tick

    • Remember to uncheck Enable Actor Tick in Class Defaults!
      • This is on by default so that the Tick event will work
  • Alternatives to Tick

    • Timers
    • Timelines
    • Manually enabling/disabling Tick on demand
  • Make sure to adjust Tick Frequency
    • 0.0= every frame

Expensive Functionality

  • Some functions are inherently expensive
    • Get All Actors of Class
    • Spawn
    • Anything that needs to iterate over a large group of objects or properties
  • Try not to use these if at all possiple
    • If you are doing it to get a reference, consider having the referenced class pass itself up so the referencing object does not need to query
    • Use TSets instead of arrays
  • If you must use them, do so as seldom as possible
    • Perferably only once,such as at Begin Play
  • Heavy ConstructionScripts can murder spawn times.
    • Consider placing in the level beforehand

Hard References in Blueprint

  • It is very easy for Blueprints to generate references to each other
  • When you load a Blueprint, every other Blueprint it references must be loaded
    • And the Blueprints referenced by those
    • And so on,and so on..
  • This will not slow dowwn in-game performance, but it can eat away at memory and load times
  • Some studios have thought the Editor just ran slow

    • Turns out they were loading moost(ot all) of their game on startup
  • Avoiding hard references:

    • Avoiding casting operations unless you are certain you need them and know that it won’t cause issue

      • For instance, if a Pickup class can only interact with the player, it might be fine to have it cast to Player
      • But having the Player class references every other type of pickup and interactive object in the game, you will likely see problems
    • Instead, use Blueprint Interfaces

    • Try to get into the mentailty of not needin a very specific reference type
      • Send your messages via an interface to a more generic class
      • If they land on something inplementing the interface, grate!
      • If not,no big deal

Other Blueprint Optimizations

  • Avoiding doing too much of any one thing(like with any scripting language)
    • Too much functionality in a sngle class
      • Break things up
      • Use a class hierarchy
        • But on that note, also avoid…
    • Class hierarchies that are too deep
    • Too many components within a class
    • Too much high-end math
      • Use the Math Expression node- it’s optimized to speed things up
    • When all else fails for BP performance: GO NATIVE!
      • At Epic, many of our Blueprints derive from generic C++ classes
      • Yours should, too!
      • Keep all the heavy functionality in code, leave the lighter stuff for Blueprint

What Actors are Ticking?

  • Did you lose track of what’s ticking? Use dumpticks
  • Dumps a list of all ticking Actors out to the log, telling you how many tick functions are called
  • Also shows how many enabled and disabled ticking Actors are in the scene

10.Draw Thread Optimization

CPU Rendering Considerations

  • Bottlenecks at the Draw thread are often caused by doing too much:
    • Too many draw calls
    • Occlusion queries - see above
    • Simulating too many particles
    • Adding too many lights - often hits the GPU harder
  • Generally the best way to speed up the Draw thread is to do less
  • Find every way you can to put fewer things on the screen
  • Generally this means either being very clever with content or using the integrated tools within UE4 start combining objects

Actor Merge Tool

  • Located under Window > Developer Tools
  • Combines selection of meshes in to new asset, replacing originals
  • Can also combine Material via Simplygon
  • Works best with many meshes having the same Material

The Actor Merge tool works best with many meshed that have the same materials as possible.If you try to combine 20 meshed and each has its own Material every materail,you are not benefiting from the tool because every material is going to make a draw call anyway.

Instanced Static Meshes

  • Mechanism for generating multiple instances of a given mesh, with each considered part of the same mesh object.
  • Can only be created throuth code or Blueprint at this time, often via the Construction Script
  • Very easy to create a Blueprint set that helps generate this
    • Placement Blueprint that is used to preview where mesh will be
    • Radius based ISM Blueprint that gathers transforms from Preview BPs and populates the instances with itself.
    • Be careful of Editor Utility class BPs-they’re Editor only!
  • Also consider Hierarchical Instanced Static Meshes
    • Handle their own occlusion/visibility

Hierarchical LOD

  • Hierarchical LODs allow multiple meshed to be combined and then reduced as a single mesh
  • Will also combine textures into atlases. reducing overall Material demands
  • Very useful for buildings and cities, groups of large meshes that need to be viewed at extreme distance
  • Requires Simplygon implementation

11.GPU Optimizations - What to do about all those pixels

Vertex Shader Optimizatin

  • Be careful how much you make use of World Position Offset
    • Often cheaper than the alternative methods of vertex animation
  • Vertex color can eventually get costly
    • On Paragon, we ended up stripping it and adding it back per instance

Pixel Shader Optimizations

Pixel Shader Don’ts

  • Too much math
  • Too many textures
  • Too many procedural functions
    • noise
  • Too many Material layers
  • Reliance on lfs(if statements)
    • Both sides have to calculate

Pixel Shader Dos

  • Use textures for lookups instead of mesh
  • Compress greyscale maps into single textures
  • Minimize Layer usage
  • Use Switch Parameters to turn off what you don’t need

Material Instruction Counts

  • Always pay attention to Material instruction counts
  • Caution: the number indicated is not accurate until you click Apply
    • Sometimes it’s best to re-compile the Material just to be safe

Dealing with Overdraw

  • Overdraw is one of the leading causes of GPU slogging
  • Minimize the geometry area for overdraw
    • Adding vertices is almost aways cheaper than relying in overdraw
    • For example, on A Boy and His Kite,we ended up cutting the grass planes to almost exactly match the outline of the grass texture alpha
  • Make use of Particle Cutout property
    • This is found under the Cascade Required Moudle
    • Feed it a texture, it automatically snips the spite
    • Also works in subUVs,with a different cutout for every frame

Managing Texture Resolution

  • Author texture at whatever resolution you like,but keep in mind you may not always use full resolution
  • Use the Texture Streaming view to see what level of mips you’re using for any given texture
  • You can use the Statistics panel set to Texture Staticstics to see what levl of mips you are using at current levels
  • Then use the Texture Editor to force mip bias
  • Or better yet,reimport at lower resolution

Lighting Considerations

  • Dynamic lights are expensive (but somewhat cheaper in deferred)

    • Small,unshadowed lights are the cheapest!
    • You can have lots of these
  • Minimize number of dynamic lights

  • Minimize number of things dynamic lights have to effect
  • Minimize dynamic light radii -tighter is better
  • Cast shadows from as few dynamic lights as possible
    • Dynamic shadow casting lights are the most expensive in UE
  • Watch out for Stationary Light Overlap

    • The fallback to dynamic lighting is extremely expensive
  • Bake whenever you can

  • Don’t assume you need dynamic lights
  • Use Mesh Distance Field shadows at distance
  • Watch out for dense shado cascades
  • Many artifacts are cleaned up with Shadow Bias,but be gentle
  • Keep Lightmap Res as low as you can

    • Use the view mode,keep it blue as much as possible
  • Avoid Light Function unless you really need them

    • Consider IES profiles, but understand they also have a cost
  • Lit translucency gets expensive, use with caution
  • Cull shadows early (at close distance as possible)
  • Cull dynamic lights as early as possible
  • Spot lights are cheaper than Point lights
  • Don’t be afraid to completely fake shadows
    • We do this a lot, especially for VR

Replication Optimization

  • Common problems for networking:
    • Doing too much
    • Doing it too much
  • Replicate as little as you can, as seldom as you can
  • Use net.* commands to check what’s going on
    • Must be run on the sever
      • Use cheat net.* to run the command on the server from the client
    • Use net.DumpRelevantActors to see what is currently replicating
      • This command features some improvements as of 4.19
    • There are a lot of these net.* commands - check online for full list


Network Relevancy View Mode(4.19)


12.Content Streaming

Texture Streaming

  • Textures streaming into and out of your scene at inopportune times cause visible pops
  • As of 4.15 we have some tools for texture streaming diagnostics
    • Texture streaming view mode
      • Primitive Distance Accuracy
      • Mesh Densities Accuracy
      • Material Texture Scales Accuracy
      • Required Texture Resolution
    • stat streaming

Primitive Distance Accuracy

  • Visialization system for texture streaming
  • Enable users to see what mips the system things they should be using, allowing for intelligent mip limits

    • Red = 2 or more mips too few
    • Orange = 1 mip too few
    • White = the right degree of mips
    • Cyan = 1 mip too many
    • Green = 2 or more mips too many
  • This setting can be adjusted using the StreamingDistanceMultiplier property

Mesh UV Densities Accuracy

  • This uses the density of a mesh’s UVs
  • Visualizes how those UV densities are distributing to densities are contributing to streaming data
  • Use the same paradigm as Primitive Distance Accuracy

    • Red = 2 or more mips too few
    • Orange = 1 mip too few
    • White = the right degree of mips
    • Cyan = 1 mip too many
    • Green = 2 or more mips too many
  • Fixing this requires the UVs on each mesh to be adjuested

Material Texture Scales Accuracy

  • This view mode samples all textures and feeds back the worst culprits for over-streaming and under-streaming
  • Data is based on streaming affected by textures that have had their UVs scaled

  • Helps diagnose streaming errors caused by UV scaling

Required Texture Resolution

  • This mode shows te required resolution for the given texture, indicating how many mips under or over it is
  • Helps show the delta between the ideal resolution for the texture-what the GPU wants to show-and what is the GPU wants to show- and what is currenrly avaliable
    • Red = 2 or more mips too few
    • Orange = 1 mip too few
    • White = the right degree of mips
    • Cyan = 1 mip too many
    • Green = 2 or more mips too many

Stat Streaming

  • Realtime metrics on texture streaming memory usage
  • Breaks down texure streaming memory into 3 pools
    • Texture
    • Streaming
    • Wanted

Level Streaming

  • Level streaming is an ideal way to control what content is in use in your game
  • What you currently need is streamed in, what you don’t is streamed out
  • Be careful how much you stream at once!
    • You may need negate some of the benefit if you have over-referenced your content within code or Blueprint

Bonus: Level Streaming as Collaboration

  • Level Streaming is also the primary way for level artists to work together
  • Different aspects are separated into different levels
    • Not just different physical zones

      World Composition

  • Specialized streaming system designed for large worlds
  • Will not work with old-school level streaming volimes
  • But WILL work with Blueprint streaming
    • Pro Tips: You can very easily make a Blueprint that functions just like a Level Streaming volume and does exactly the same thing, only better.


Collision Optimization


  • ViewMode -> Visbility Collision



Collision Analyzer


  • Window -> Developer Tools -> Collision Analyzer


Profile Data Visualizer


  • ProfileDataVisualizer(FRAME)
    • Scene
      • UpdateSceneObjectData
      • UpdateGlobalDistanceVolume
    • SlateUI
    • FRAME Leaf Evnets



Update Global Distance Volume

参考了这个问题Update Global Distance Field Volume taking longer than usual

The global distance field is updated if any of the features using it are enabled:

  • Distance field particle collision

  • DistanceToNearestSurface material node

  • Shadow casting movable skylight

It also updates if Ray Traced Distance field shadows are enabled, but that’s a bug. You can workaround it by forcing global distance fields off with ‘r.AOGlobalDistanceField 0’.

I’ll assume you are actually using a feature that requires it. The global distance field is a cache around the camera that has to update if the camera moves a lot, or if you have a moving static mesh which has bAffectDistanceFieldLighting enabled. The bigger the static mesh, the more expensive the update will be. Use ‘r.AOGlobalDistanceFieldLogModifiedPrimitives 1’ to track down which objects it is and disable bAffectDistanceFieldLighting on them.


  • UpdateGlobalDistanceFieldVolume
    • CacheType MostlyStaic Clipmap0
    • CacheType MostlyStaic Clipmap1
    • CacheType MostlyStaic Clipmap2
    • CacheType MostlyStaic Clipmap3
    • CacheType Movable Clipmap0
      • GridCull
      • TileCullAndComposite 128x128x128
      • CompositeHeightfelds
    • CacheType Movable Clipmap1
    • CacheType Movable Clipmap2
    • CacheType Movable Clipmap3


So I put a skylight back in, set it from moveable to stationary and it has solved the performance issue. Could this potentially be a but as you told me Shadow Casting Moving Skylights cause it to be updated, however it appears that any moving skylight causes the update to the global distance field. Thanks for the hint that lead to the answer :)



Distance field particle collision

参考了Using Particle Collision Mode for Distance Fields


RayTraced Distance Field Soft Shadows

参考了RayTraced Distance Field Soft Shadows

LOD Optimization (未整理)

GPU Performance for Game Artists


Tech Art Aid videos on Youtube










