NoiseBall系列学习笔记

Posted by FlowingCrescent on 2021-08-31
Estimated Reading Time 14 Minutes
Words 3k In Total
Viewed Times

工作相关的内容不太能发,就发点跟项目完全无关的东西……这次选中的是Kejiro大神的NoiseBall系列,2是使用Compute Shader和DrawMeshInstanceIndirect的例子,3是使用DrawProcedural的例子,而5结合了Job System,可以作为这一些API的入门教程学习。

NoiseBall2

Github

原理比较清晰明了:使用Compute Shader计算出每个三角形的每个顶点位置,之后将顶点位置Buffer传入材质,利用DrawMeshInstancedIndirect()进行批处理渲染.
可以说与Colin的画草工程在基本原理上如出一辙.

那么之后也以这两个函数为基本线索进行记录

DrawMeshInstancedIndirect

官方文档
这个函数在工程中的调用是这样的:

1
2
3
4
5
Graphics.DrawMeshInstancedIndirect(
_mesh, 0, _material,
new Bounds(transform.position, transform.lossyScale * 5),
_drawArgsBuffer, 0
);

Graphics.DrawMeshInstancedIndirect(Mesh mesh, int submeshIndex, Material material, Bounds bounds, ComputeBuffer bufferWithArgs, int argsOffset)

第一个参数为三角形Mesh,通过脚本可自动生成:

1
2
3
4
_mesh = new Mesh();
_mesh.vertices = new Vector3 [3];
_mesh.SetIndices(new [] {0, 1, 2}, MeshTopology.Triangles, 0);
_mesh.UploadMeshData(true);

第二个int指定绘制哪个submesh,当然这次不需要submesh.第三个为材质,第四个为Bounds,用于CPU快速剔除

第五个ComputeBuffer则比较讲究,它的内容以及顺序都是有指定要求的:

Buffer with arguments, bufferWithArgs, has to have five integer numbers at given argsOffset offset: index count per instance, instance count, start index location, base vertex location, start instance location.

即每个Instance的Mesh的顶点数、所需要画的Instance数、起始Index、基础顶点位置、起始Instance位置。(这里的“位置”应该都是指index数?)
在工程中如此申请该Buffer:

1
2
3
4
5
6
      _drawArgsBuffer = new ComputeBuffer(
1, 5 * sizeof(uint), ComputeBufferType.IndirectArguments
);
//...

_drawArgsBuffer.SetData(new uint[5] {3, (uint)TriangleCount, 0, 0, 0});

注意ComputeBufferType.IndirectArguments是当ComputeBuffer作为DrawMeshInstancedIndirect时所指定的类型.
TriangleCount就是我们指定的三角形数量.

Compute Shader

Compute Shader直接贴代码比较直观

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#pragma kernel Update

#include "UnityCG.cginc"
#include "SimplexNoise3D.cginc"

RWStructuredBuffer<float4> PositionBuffer;
RWStructuredBuffer<float4> NormalBuffer;

CBUFFER_START(Params)
float Time;
float Extent;
float NoiseAmplitude;
float NoiseFrequency;
float3 NoiseOffset;
CBUFFER_END

float Random(float u, float v)
{
float f = dot(float2(12.9898, 78.233), float2(u, v));
return frac(43758.5453 * sin(f));
}

float3 RandomPoint(float id)
{
float u = Random(id * 0.01334, 0.3728) * UNITY_PI * 2;
float z = Random(0.8372, id * 0.01197) * 2 - 1;
return float3(float2(cos(u), sin(u)) * sqrt(1 - z * z), z);
}

[numthreads(64, 1, 1)]
void Update(uint id : SV_DispatchThreadID)
{
//一个Core运算一个三角形的三个顶点
int idx1 = id * 3;
int idx2 = id * 3 + 1;
int idx3 = id * 3 + 2;

//以一秒为单位生成一个seed,伪随机出三个float3(范围均在0-1)
float seed = floor(Time + id * 0.1) * 0.1;
float3 v1 = RandomPoint(idx1 + seed);
float3 v2 = RandomPoint(idx2 + seed);
float3 v3 = RandomPoint(idx3 + seed);

//v1为顶点1位置,v2、v3则为顶点位置
v2 = normalize(v1 + normalize(v2 - v1) * Extent);
v3 = normalize(v1 + normalize(v3 - v1) * Extent);

//NoiseFrequency为定值,NoiseOffset每帧增加deltaTime
float l1 = snoise(v1 * NoiseFrequency + NoiseOffset).w;
float l2 = snoise(v2 * NoiseFrequency + NoiseOffset).w;
float l3 = snoise(v3 * NoiseFrequency + NoiseOffset).w;

//每个noise值都三次方,增强对比
l1 = abs(l1 * l1 * l1);
l2 = abs(l2 * l2 * l2);
l3 = abs(l3 * l3 * l3);

//给位置加上noise
v1 *= 1 + l1 * NoiseAmplitude;
v2 *= 1 + l2 * NoiseAmplitude;
v3 *= 1 + l3 * NoiseAmplitude;

float3 n = normalize(cross(v2 - v1, v3 - v2));

PositionBuffer[idx1] = float4(v1, 0);
PositionBuffer[idx2] = float4(v2, 0);
PositionBuffer[idx3] = float4(v3, 0);

NormalBuffer[idx1] = float4(n, 0);
NormalBuffer[idx2] = float4(n, 0);
NormalBuffer[idx3] = float4(n, 0);
}

Compute Shader的内容确实很简单了,就是生成随机的三角形坐标,然后加上Noise

然后在之前DrawMeshInstancedIndirect中使用的材质Shader中直接使用即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#if defined(ENABLE_INSTANCING)

StructuredBuffer<float4> _PositionBuffer;
StructuredBuffer<float4> _NormalBuffer;

#endif

void vert(inout appdata v)
{
#if defined(ENABLE_INSTANCING)

uint id = unity_InstanceID * 3 + v.vid;

v.vertex.xyz = _PositionBuffer[id].xyz;
v.normal = _NormalBuffer[id].xyz;

#endif
}

void setup()
{
unity_ObjectToWorld = _LocalToWorld;
unity_WorldToObject = _WorldToLocal;
}

void surf(Input IN, inout SurfaceOutputStandard o)
{
o.Albedo = _Color.rgb;
o.Metallic = _Metallic;
o.Smoothness = _Smoothness;
o.Normal = float3(0, 0, IN.vface < 0 ? -1 : 1);
}

NoiseBall 3

Github

这次的效果比起上次更为炫酷了,但其实原理上并没有复杂多少,甚至代码写得更短了。

DrawProcedural

官方文档

1
2
3
4
5
6
7
Graphics.DrawProcedural(
_material,
new Bounds(transform.position, transform.lossyScale * 5),
MeshTopology.Triangles, _triangleCount * 3, 1,
null, null,
ShadowCastingMode.TwoSided, true, gameObject.layer
);

public static void DrawProcedural(Material material, Bounds bounds, MeshTopology topology, int vertexCount, int instanceCount, Camera camera, MaterialPropertyBlock properties, Rendering.ShadowCastingMode castShadows, bool receiveShadows, int layer);
看上去参数很多,但其实都是比较简单设置一下就行的参数
脚本里其他内容全是SetFloat之类的内容了,可以说最关键的只有这么几行。
​其中比较有意思的大概是Camera,这个是确定是否绘制给指定的摄像机,如果是null则所有相机都看得到。

Shader

这次顶点动画就是在Vertex中直接算了
关键是通过uint vid : SV_VertexID来获取三角形ID以及在三角形中的顶点ID,以进行之后的计算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
Shader "Hidden/NoiseBall3"
{
SubShader
{
Tags { "RenderType"="Opaque" }

Cull Off

CGPROGRAM

#pragma surface surf Standard vertex:vert addshadow
#pragma target 3.0

#include "UnityCG.cginc"
#include "SimplexNoise3D.cginc"

// Hash function from H. Schechter & R. Bridson, goo.gl/RXiKaH
uint Hash(uint s)
{
s ^= 2747636419u;
s *= 2654435769u;
s ^= s >> 16;
s *= 2654435769u;
s ^= s >> 16;
s *= 2654435769u;
return s;
}

// Random number (0-1)
float Random(uint seed)
{
return float(Hash(seed)) / 4294967295.0; // 2^32-1
}

// Random point on unit sphere
float3 RandomPoint(uint seed)
{
float u = Random(seed * 2 + 0) * UNITY_PI * 2;
float z = Random(seed * 2 + 1) * 2 - 1;
return float3(float2(cos(u), sin(u)) * sqrt(1 - z * z), z);
}

struct appdata
{
float4 vertex : POSITION;
float3 normal : NORMAL;
float4 tangent : TANGENT;
float4 color : COLOR;
float4 texcoord1 : TEXCOORD1;
float4 texcoord2 : TEXCOORD2;
uint vid : SV_VertexID;
};

struct Input
{
float vface : VFACE;
float4 color : COLOR;
};

half4 _Color;
half _Smoothness;
half _Metallic;
half3 _Emission;

uint _TriangleCount;
float _LocalTime;
float _Extent;
float _NoiseAmplitude;
float _NoiseFrequency;
float3 _NoiseOffset;

float4x4 _LocalToWorld;
float4x4 _WorldToLocal;

void vert(inout appdata v)
{
uint t_idx = v.vid / 3; // Triangle index
uint v_idx = v.vid - t_idx * 3; // Vertex index

// Time dependent random number seed
uint seed = _LocalTime + (float)t_idx / _TriangleCount;
seed = ((seed << 16) + t_idx) * 4;

// Random triangle on unit sphere
float3 v1 = RandomPoint(seed + 0);
float3 v2 = RandomPoint(seed + 1);
float3 v3 = RandomPoint(seed + 2);

// Constraint with the extent parameter
v2 = normalize(v1 + normalize(v2 - v1) * _Extent);
v3 = normalize(v1 + normalize(v3 - v1) * _Extent);

// Displacement by noise field
float l1 = snoise(v1 * _NoiseFrequency + _NoiseOffset).w;
float l2 = snoise(v2 * _NoiseFrequency + _NoiseOffset).w;
float l3 = snoise(v3 * _NoiseFrequency + _NoiseOffset).w;

l1 = abs(l1 * l1 * l1);
l2 = abs(l2 * l2 * l2);
l3 = abs(l3 * l3 * l3);

v1 *= 1 + l1 * _NoiseAmplitude;
v2 *= 1 + l2 * _NoiseAmplitude;
v3 *= 1 + l3 * _NoiseAmplitude;

// Vertex position/normal modification
v.vertex.xyz = v_idx == 0 ? v1 : (v_idx == 1 ? v2 : v3);
v.normal = normalize(cross(v2 - v1, v3 - v2));

// Random emission
v.color = 0.95 < Random(seed + 3);

// Transform modification
unity_ObjectToWorld = _LocalToWorld;
unity_WorldToObject = _WorldToLocal;
}

void surf(Input IN, inout SurfaceOutputStandard o)
{
o.Albedo = _Color.rgb;
o.Metallic = _Metallic;
o.Smoothness = _Smoothness;
o.Normal = float3(0, 0, IN.vface < 0 ? -1 : 1); // back face support
o.Emission = _Emission * IN.color.rgb;
}

ENDCG
}
FallBack "Diffuse"
}

NoiseBall 5

Github
image.png
这次则是借助Jobs,完全使用CPU进行生成Mesh&顶点运动的计算,因此我们只需要看NoiseBallRenderer.cs这一个脚本即可。
每帧所做的操作如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void Update()
{
if (_indexBuffer.Length != VertexCount)
{
// The vertex count was changed, or this is the first update.

// Dispose the current mesh.
_mesh.Clear();
DisposeBuffers();

// Mesh reallocation and reconstruction
AllocateBuffers();
UpdateVertexBuffer();
InitializeMesh();
}
else
{
// Only update the vertex data.
UpdateVertexBuffer();
UpdateMesh();
}
}

先从DisposeBuffer()还有AllocateBuffers()开始吧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
//释放原有内存
void DisposeBuffers()
{
if (_indexBuffer.IsCreated) _indexBuffer.Dispose();
if (_vertexBuffer.IsCreated) _vertexBuffer.Dispose();
}

void AllocateBuffers()
{
_indexBuffer = new NativeArray<uint>(
VertexCount, Allocator.Persistent,
NativeArrayOptions.UninitializedMemory
);

_vertexBuffer = new NativeArray<float3>(
VertexCount * 2, Allocator.Persistent,
NativeArrayOptions.UninitializedMemory
);

// Index array initialization
for (var i = 0; i < VertexCount; i++) _indexBuffer[i] = (uint)i;
}

AllocateBuffers()也就是申请两个NativeArray,他们的类型是uint或者float3这种跟GPU匹配的类型这一点还是很有趣的。
_indexBuffer存储index,而_vertexBuffer存储位置与法线,
Allocator则是说明了内存的分配方式,这里选的Persistent即为持续分配。
NativeArrayOptions选择UninitializedMemory则是指明需要未初始化的内存。

InitializeMesh()只在一开始执行一次

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
void InitializeMesh()
{
_mesh.SetVertexBufferParams(
VertexCount,
new VertexAttributeDescriptor
(VertexAttribute.Position, VertexAttributeFormat.Float32, 3),
new VertexAttributeDescriptor
(VertexAttribute.Normal, VertexAttributeFormat.Float32, 3)
);
_mesh.SetVertexBufferData(_vertexBuffer, 0, 0, VertexCount * 2);

_mesh.SetIndexBufferParams(VertexCount, IndexFormat.UInt32);
_mesh.SetIndexBufferData(_indexBuffer, 0, 0, VertexCount);

_mesh.SetSubMesh(0, new SubMeshDescriptor(0, VertexCount));
_mesh.bounds = new Bounds(Vector3.zero, Vector3.one * 1000);
}

Mesh.SetVertexBufferParams,设置VBO的大小和格式,比较好理解,写法很讲究。
然后再结合SetVertexBufferData,把之前申请的内存都填上
SetVertexBufferData(NativeArray<T> data, int dataStart, int meshBufferStart, int count);

count:Number of vertices to copy.

虽然还是不太理解这意思,但是大概是因为设置了position和normal俩属性所以才要乘2吧。
之后就是设置SubMesh和Bound,没啥特别好说的。

UpdateVertexBuffer()UpdateMesh()均为每帧执行,前者也正是这个效果的关键。
后者很简单,就是每帧更新一下VertexBufferData。

1
2
3
4
void UpdateMesh()
{
_mesh.SetVertexBufferData(_vertexBuffer, 0, 0, VertexCount * 2);
}

UpdateVertexBuffer()如下,就是创建了VertexUpdateJob结构体,对一些变量进行赋值,然后执行Schedule()

1
2
3
4
5
6
7
8
9
10
11
12
13
void UpdateVertexBuffer()
{
var job = new VertexUpdateJob{
seed = _randomSeed,
extent = _extent,
noiseFrequency = _noiseFrequency,
noiseOffset = _noiseAnimation * Time.time,
noiseAmplitude = _noiseAmplitude,
buffer = _vertexBuffer
};

job.Schedule((int)_triangleCount, 64).Complete();
}

那么这个VertexUpdateJob结构体便是最关键的内容了

Job

IJobParallelFor
先贴上Job的完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
[BurstCompile(CompileSynchronously = true,
FloatMode = FloatMode.Fast, FloatPrecision = FloatPrecision.Low)]
struct VertexUpdateJob : IJobParallelFor
{

[ReadOnly] public uint seed;
[ReadOnly] public float extent;
[ReadOnly] public float noiseFrequency;
[ReadOnly] public float3 noiseOffset;
[ReadOnly] public float noiseAmplitude;

[NativeDisableParallelForRestriction]
[WriteOnly] public NativeArray<float3> buffer;

Random _random;

float3 RandomPoint()
{
var u = _random.NextFloat(math.PI * 2);
var z = _random.NextFloat(-1, 1);
var xy = math.sqrt(1 - z * z);
return math.float3(math.cos(u) * xy, math.sin(u) * xy, z);
}

public void Execute(int i)
{
_random = new Random(seed + (uint)i * 10);
_random.NextInt();

var v1 = RandomPoint();
var v2 = RandomPoint();
var v3 = RandomPoint();

v2 = math.normalize(v1 + math.normalize(v2 - v1) * extent);
v3 = math.normalize(v1 + math.normalize(v3 - v1) * extent);

var l1 = noise.snoise(v1 * noiseFrequency + noiseOffset);
var l2 = noise.snoise(v2 * noiseFrequency + noiseOffset);
var l3 = noise.snoise(v3 * noiseFrequency + noiseOffset);

l1 = math.abs(l1 * l1 * l1);
l2 = math.abs(l2 * l2 * l2);
l3 = math.abs(l3 * l3 * l3);

v1 *= 1 + l1 * noiseAmplitude;
v2 *= 1 + l2 * noiseAmplitude;
v3 *= 1 + l3 * noiseAmplitude;

var n = math.cross(v2 - v1, v3 - v1);

var offs = i * 6;

buffer[offs++] = v1;
buffer[offs++] = n;
buffer[offs++] = v2;
buffer[offs++] = n;
buffer[offs++] = v3;
buffer[offs++] = n;
}
}

其实主要内容还是跟之前的NoiseBall一样,那么我们主要着眼于不同的部分。
开头写了好几个Attribute:
[BurstCompile(CompileSynchronously = true,FloatMode = FloatMode.Fast, FloatPrecision = FloatPrecision.Low)]
官方文档
这些Attribute就是专门给Job使用,来进行一些参数的设置
第一个CompileSynchronously,正如其名,

Gets or sets whether or not to burst compile the code immediately on first use, or in the background over time.

设置是否在第一次使用的时候就进行Brust编译,设置为false则会在后台慢慢编译

FloatMode = FloatMode.Fast
则是允许“代数等效优化”(algebraically equivalent optimizations),总之就是修改一些算法,在模拟结果的同时对计算进行加速,具体可参照官方文档

FloatPrecision = FloatPrecision.Low
浮点数精度设置为低,加速计算。

之后便是标注了[ReadOnly]的许多变量

By declaring it as read only, multiple jobs are allowed to access the data in parallel

设置为ReadOnly,则可以让这些数据被并行化地访问。

然后给要写入数据的buffer设置了
[NativeDisableParallelForRestriction]
可惜的是官方文档也没有对这个Attribute进行详细的介绍(甚至根本就是特么没写),只能字面意义上的理解为取消并行化防止数据出错了。

然后创建了一个Random类,使用它的函数

1
2
3
4
5
6
/// <summary>Returns a uniformly random float value in the interval [0, 1).</summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public float NextFloat()
{
return asfloat(0x3f800000 | (NextState() >> 9)) - 1.0f;
}

生成伪随机,说实话看不懂这个操作……

其他的normalize、snoise等函数也都是出自Unity.Mathematics。

具体的计算内容跟之前就没啥区别。
根据buffer的赋值顺序,我们也可以知道buffer中顶点位置法线信息是交替存储的,而不是先存所有的位置,再存所有的法线。

关于在主线程中调用job.Schedule((int)_triangleCount, 64).Complete();
关于其中的几个参数,官方在IJobParallelFor的注释中如此写到:

Schedule a parallel-for job. First parameter is how many for-each iterations to perform.
The second parameter is the batch size, essentially the no-overhead innerloop that just invokes Execute(i) in a loop.
When there is a lot of work in each iteration then a value of 1 can be sensible.
When there is very little work values of 32 or 64 can make sense.

第一个参数是每个迭代执行多少次。
第二个参数是批处理大小, 本质上是只在循环中调用Execute(i)的无开销innerloop。
当在每个迭代中有很多工作时,值1是合理的,当工作很少时,32或64则不错。

关于Schedule的其他参数也可以参考官方文档

这次对Job没有一个总体层面的了解直接开始看了API,感觉还是略蒙蔽。


感谢您阅读完本文,若您认为本文对您有所帮助,可以将其分享给其他人;若您发现文章对您的利益产生侵害,请联系作者进行删除。