NoiseBall系列学习笔记 - Sand and Foam

工作相关的内容不太能发，就发点跟项目完全无关的东西……这次选中的是Kejiro大神的NoiseBall系列，2是使用Compute Shader和DrawMeshInstanceIndirect的例子，3是使用DrawProcedural的例子，而5结合了Job System，可以作为这一些API的入门教程学习。

NoiseBall2

Github

原理比较清晰明了:使用Compute Shader计算出每个三角形的每个顶点位置,之后将顶点位置Buffer传入材质,利用DrawMeshInstancedIndirect()进行批处理渲染.
可以说与Colin的画草工程在基本原理上如出一辙.

那么之后也以这两个函数为基本线索进行记录

DrawMeshInstancedIndirect

官方文档
这个函数在工程中的调用是这样的:

Graphics.DrawMeshInstancedIndirect(
    _mesh, 0, _material,
    new Bounds(transform.position, transform.lossyScale * 5),
    _drawArgsBuffer, 0
);

Graphics.DrawMeshInstancedIndirect(Mesh mesh, int submeshIndex, Material material, Bounds bounds, ComputeBuffer bufferWithArgs, int argsOffset)

第一个参数为三角形Mesh,通过脚本可自动生成:

_mesh = new Mesh();
_mesh.vertices = new Vector3 [3];
_mesh.SetIndices(new [] {0, 1, 2}, MeshTopology.Triangles, 0);
_mesh.UploadMeshData(true);

第二个int指定绘制哪个submesh,当然这次不需要submesh.第三个为材质,第四个为Bounds,用于CPU快速剔除

第五个ComputeBuffer则比较讲究,它的内容以及顺序都是有指定要求的:

Buffer with arguments, bufferWithArgs, has to have five integer numbers at given argsOffset offset: index count per instance, instance count, start index location, base vertex location, start instance location.

即每个Instance的Mesh的顶点数、所需要画的Instance数、起始Index、基础顶点位置、起始Instance位置。（这里的“位置”应该都是指index数？）
在工程中如此申请该Buffer:

      _drawArgsBuffer = new ComputeBuffer(
          1, 5 * sizeof(uint), ComputeBufferType.IndirectArguments
      );
//...

_drawArgsBuffer.SetData(new uint[5] {3, (uint)TriangleCount, 0, 0, 0});

注意ComputeBufferType.IndirectArguments是当ComputeBuffer作为DrawMeshInstancedIndirect时所指定的类型.
TriangleCount就是我们指定的三角形数量.

Compute Shader

Compute Shader直接贴代码比较直观

#pragma kernel Update

#include "UnityCG.cginc"
#include "SimplexNoise3D.cginc"

RWStructuredBuffer<float4> PositionBuffer;
RWStructuredBuffer<float4> NormalBuffer;

CBUFFER_START(Params)
    float Time;
    float Extent;
    float NoiseAmplitude;
    float NoiseFrequency;
    float3 NoiseOffset;
CBUFFER_END

float Random(float u, float v)
{
    float f = dot(float2(12.9898, 78.233), float2(u, v));
    return frac(43758.5453 * sin(f));
}

float3 RandomPoint(float id)
{
    float u = Random(id * 0.01334, 0.3728) * UNITY_PI * 2;
    float z = Random(0.8372, id * 0.01197) * 2 - 1;
    return float3(float2(cos(u), sin(u)) * sqrt(1 - z * z), z);
}

[numthreads(64, 1, 1)]
void Update(uint id : SV_DispatchThreadID)
{
    //一个Core运算一个三角形的三个顶点
    int idx1 = id * 3;
    int idx2 = id * 3 + 1;
    int idx3 = id * 3 + 2;
    
    //以一秒为单位生成一个seed，伪随机出三个float3（范围均在0-1）
    float seed = floor(Time + id * 0.1) * 0.1;
    float3 v1 = RandomPoint(idx1 + seed);
    float3 v2 = RandomPoint(idx2 + seed);
    float3 v3 = RandomPoint(idx3 + seed);

    //v1为顶点1位置，v2、v3则为顶点位置
    v2 = normalize(v1 + normalize(v2 - v1) * Extent);
    v3 = normalize(v1 + normalize(v3 - v1) * Extent);

    //NoiseFrequency为定值，NoiseOffset每帧增加deltaTime
    float l1 = snoise(v1 * NoiseFrequency + NoiseOffset).w;
    float l2 = snoise(v2 * NoiseFrequency + NoiseOffset).w;
    float l3 = snoise(v3 * NoiseFrequency + NoiseOffset).w;

    //每个noise值都三次方，增强对比
    l1 = abs(l1 * l1 * l1);
    l2 = abs(l2 * l2 * l2);
    l3 = abs(l3 * l3 * l3);

    //给位置加上noise
    v1 *= 1 + l1 * NoiseAmplitude;
    v2 *= 1 + l2 * NoiseAmplitude;
    v3 *= 1 + l3 * NoiseAmplitude;

    float3 n = normalize(cross(v2 - v1, v3 - v2));

    PositionBuffer[idx1] = float4(v1, 0);
    PositionBuffer[idx2] = float4(v2, 0);
    PositionBuffer[idx3] = float4(v3, 0);

    NormalBuffer[idx1] = float4(n, 0);
    NormalBuffer[idx2] = float4(n, 0);
    NormalBuffer[idx3] = float4(n, 0);
}

Compute Shader的内容确实很简单了，就是生成随机的三角形坐标，然后加上Noise

然后在之前DrawMeshInstancedIndirect中使用的材质Shader中直接使用即可：

#if defined(ENABLE_INSTANCING)

StructuredBuffer<float4> _PositionBuffer;
StructuredBuffer<float4> _NormalBuffer;

#endif

void vert(inout appdata v)
{
    #if defined(ENABLE_INSTANCING)

    uint id = unity_InstanceID * 3 + v.vid;

    v.vertex.xyz = _PositionBuffer[id].xyz;
    v.normal = _NormalBuffer[id].xyz;

    #endif
}

void setup()
{
    unity_ObjectToWorld = _LocalToWorld;
    unity_WorldToObject = _WorldToLocal;
}

void surf(Input IN, inout SurfaceOutputStandard o)
{
    o.Albedo = _Color.rgb;
    o.Metallic = _Metallic;
    o.Smoothness = _Smoothness;
    o.Normal = float3(0, 0, IN.vface < 0 ? -1 : 1);
}

NoiseBall 3

Github

这次的效果比起上次更为炫酷了，但其实原理上并没有复杂多少，甚至代码写得更短了。

DrawProcedural

官方文档

Graphics.DrawProcedural(
    _material,
    new Bounds(transform.position, transform.lossyScale * 5),
    MeshTopology.Triangles, _triangleCount * 3, 1,
    null, null,
    ShadowCastingMode.TwoSided, true, gameObject.layer
);

public static void DrawProcedural(Material material, Bounds bounds, MeshTopology topology, int vertexCount, int instanceCount, Camera camera, MaterialPropertyBlock properties, Rendering.ShadowCastingMode castShadows, bool receiveShadows, int layer);

看上去参数很多，但其实都是比较简单设置一下就行的参数
脚本里其他内容全是SetFloat之类的内容了，可以说最关键的只有这么几行。
其中比较有意思的大概是Camera，这个是确定是否绘制给指定的摄像机，如果是null则所有相机都看得到。

Shader

这次顶点动画就是在Vertex中直接算了
关键是通过uint vid : SV_VertexID来获取三角形ID以及在三角形中的顶点ID，以进行之后的计算

Shader "Hidden/NoiseBall3"
{
    SubShader
    {
        Tags { "RenderType"="Opaque" }

        Cull Off

        CGPROGRAM

        #pragma surface surf Standard vertex:vert addshadow
        #pragma target 3.0

        #include "UnityCG.cginc"
        #include "SimplexNoise3D.cginc"

        // Hash function from H. Schechter & R. Bridson, goo.gl/RXiKaH
        uint Hash(uint s)
        {
            s ^= 2747636419u;
            s *= 2654435769u;
            s ^= s >> 16;
            s *= 2654435769u;
            s ^= s >> 16;
            s *= 2654435769u;
            return s;
        }

        // Random number (0-1)
        float Random(uint seed)
        {
            return float(Hash(seed)) / 4294967295.0; // 2^32-1
        }

        // Random point on unit sphere
        float3 RandomPoint(uint seed)
        {
            float u = Random(seed * 2 + 0) * UNITY_PI * 2;
            float z = Random(seed * 2 + 1) * 2 - 1;
            return float3(float2(cos(u), sin(u)) * sqrt(1 - z * z), z);
        }

        struct appdata
        {
            float4 vertex : POSITION;
            float3 normal : NORMAL;
            float4 tangent : TANGENT;
            float4 color : COLOR;
            float4 texcoord1 : TEXCOORD1;
            float4 texcoord2 : TEXCOORD2;
            uint vid : SV_VertexID;
        };

        struct Input
        {
            float vface : VFACE;
            float4 color : COLOR;
        };

        half4 _Color;
        half _Smoothness;
        half _Metallic;
        half3 _Emission;

        uint _TriangleCount;
        float _LocalTime;
        float _Extent;
        float _NoiseAmplitude;
        float _NoiseFrequency;
        float3 _NoiseOffset;

        float4x4 _LocalToWorld;
        float4x4 _WorldToLocal;

        void vert(inout appdata v)
        {
            uint t_idx = v.vid / 3;         // Triangle index
            uint v_idx = v.vid - t_idx * 3; // Vertex index

            // Time dependent random number seed
            uint seed = _LocalTime + (float)t_idx / _TriangleCount;
            seed = ((seed << 16) + t_idx) * 4;

            // Random triangle on unit sphere
            float3 v1 = RandomPoint(seed + 0);
            float3 v2 = RandomPoint(seed + 1);
            float3 v3 = RandomPoint(seed + 2);

            // Constraint with the extent parameter
            v2 = normalize(v1 + normalize(v2 - v1) * _Extent);
            v3 = normalize(v1 + normalize(v3 - v1) * _Extent);

            // Displacement by noise field
            float l1 = snoise(v1 * _NoiseFrequency + _NoiseOffset).w;
            float l2 = snoise(v2 * _NoiseFrequency + _NoiseOffset).w;
            float l3 = snoise(v3 * _NoiseFrequency + _NoiseOffset).w;

            l1 = abs(l1 * l1 * l1);
            l2 = abs(l2 * l2 * l2);
            l3 = abs(l3 * l3 * l3);

            v1 *= 1 + l1 * _NoiseAmplitude;
            v2 *= 1 + l2 * _NoiseAmplitude;
            v3 *= 1 + l3 * _NoiseAmplitude;

            // Vertex position/normal modification
            v.vertex.xyz = v_idx == 0 ? v1 : (v_idx == 1 ? v2 : v3);
            v.normal = normalize(cross(v2 - v1, v3 - v2));

            // Random emission
            v.color = 0.95 < Random(seed + 3);

            // Transform modification
            unity_ObjectToWorld = _LocalToWorld;
            unity_WorldToObject = _WorldToLocal;
        }

        void surf(Input IN, inout SurfaceOutputStandard o)
        {
            o.Albedo = _Color.rgb;
            o.Metallic = _Metallic;
            o.Smoothness = _Smoothness;
            o.Normal = float3(0, 0, IN.vface < 0 ? -1 : 1); // back face support
            o.Emission = _Emission * IN.color.rgb;
        }

        ENDCG
    }
    FallBack "Diffuse"
}

NoiseBall 5

Github

这次则是借助Jobs，完全使用CPU进行生成Mesh&顶点运动的计算，因此我们只需要看NoiseBallRenderer.cs这一个脚本即可。
每帧所做的操作如下：

void Update()
{
    if (_indexBuffer.Length != VertexCount)
    {
        // The vertex count was changed, or this is the first update.

        // Dispose the current mesh.
        _mesh.Clear();
        DisposeBuffers();

        // Mesh reallocation and reconstruction
        AllocateBuffers();
        UpdateVertexBuffer();
        InitializeMesh();
    }
    else
    {
        // Only update the vertex data.
        UpdateVertexBuffer();
        UpdateMesh();
    }
}

先从DisposeBuffer()还有AllocateBuffers()开始吧

//释放原有内存
void DisposeBuffers()
   {
       if (_indexBuffer.IsCreated) _indexBuffer.Dispose();
       if (_vertexBuffer.IsCreated) _vertexBuffer.Dispose();
   }

void AllocateBuffers()
   {
       _indexBuffer = new NativeArray<uint>(
           VertexCount, Allocator.Persistent,
           NativeArrayOptions.UninitializedMemory
       );

       _vertexBuffer = new NativeArray<float3>(
           VertexCount * 2, Allocator.Persistent,
           NativeArrayOptions.UninitializedMemory
       );

       // Index array initialization
       for (var i = 0; i < VertexCount; i++) _indexBuffer[i] = (uint)i;
   }

AllocateBuffers()也就是申请两个NativeArray，他们的类型是uint或者float3这种跟GPU匹配的类型这一点还是很有趣的。
_indexBuffer存储index，而_vertexBuffer存储位置与法线，
Allocator则是说明了内存的分配方式，这里选的Persistent即为持续分配。
NativeArrayOptions选择UninitializedMemory则是指明需要未初始化的内存。

InitializeMesh()只在一开始执行一次

void InitializeMesh()
{
    _mesh.SetVertexBufferParams(
        VertexCount,
        new VertexAttributeDescriptor
            (VertexAttribute.Position, VertexAttributeFormat.Float32, 3),
        new VertexAttributeDescriptor
            (VertexAttribute.Normal, VertexAttributeFormat.Float32, 3)
    );
    _mesh.SetVertexBufferData(_vertexBuffer, 0, 0, VertexCount * 2);

    _mesh.SetIndexBufferParams(VertexCount, IndexFormat.UInt32);
    _mesh.SetIndexBufferData(_indexBuffer, 0, 0, VertexCount);

    _mesh.SetSubMesh(0, new SubMeshDescriptor(0, VertexCount));
    _mesh.bounds = new Bounds(Vector3.zero, Vector3.one * 1000);
}

Mesh.SetVertexBufferParams，设置VBO的大小和格式，比较好理解，写法很讲究。
然后再结合SetVertexBufferData，把之前申请的内存都填上
SetVertexBufferData(NativeArray<T> data, int dataStart, int meshBufferStart, int count);

count：Number of vertices to copy.

虽然还是不太理解这意思，但是大概是因为设置了position和normal俩属性所以才要乘2吧。
之后就是设置SubMesh和Bound，没啥特别好说的。

UpdateVertexBuffer()和UpdateMesh()均为每帧执行，前者也正是这个效果的关键。
后者很简单，就是每帧更新一下VertexBufferData。

void UpdateMesh()
{
    _mesh.SetVertexBufferData(_vertexBuffer, 0, 0, VertexCount * 2);
}

UpdateVertexBuffer()如下，就是创建了VertexUpdateJob结构体，对一些变量进行赋值，然后执行Schedule()

void UpdateVertexBuffer()
{
    var job = new VertexUpdateJob{
        seed = _randomSeed,
        extent = _extent,
        noiseFrequency = _noiseFrequency,
        noiseOffset = _noiseAnimation * Time.time,
        noiseAmplitude = _noiseAmplitude,
        buffer = _vertexBuffer
    };

    job.Schedule((int)_triangleCount, 64).Complete();
}

那么这个VertexUpdateJob结构体便是最关键的内容了

Job

IJobParallelFor
先贴上Job的完整代码：

[BurstCompile(CompileSynchronously = true,
    FloatMode = FloatMode.Fast, FloatPrecision = FloatPrecision.Low)]
struct VertexUpdateJob : IJobParallelFor
{
    
    [ReadOnly] public uint seed;
    [ReadOnly] public float extent;
    [ReadOnly] public float noiseFrequency;
    [ReadOnly] public float3 noiseOffset;
    [ReadOnly] public float noiseAmplitude;

    [NativeDisableParallelForRestriction]
    [WriteOnly] public NativeArray<float3> buffer;

    Random _random;

    float3 RandomPoint()
    {
        var u = _random.NextFloat(math.PI * 2);
        var z = _random.NextFloat(-1, 1);
        var xy = math.sqrt(1 - z * z);
        return math.float3(math.cos(u) * xy, math.sin(u) * xy, z);
    }

    public void Execute(int i)
    {
        _random = new Random(seed + (uint)i * 10);
        _random.NextInt();

        var v1 = RandomPoint();
        var v2 = RandomPoint();
        var v3 = RandomPoint();

        v2 = math.normalize(v1 + math.normalize(v2 - v1) * extent);
        v3 = math.normalize(v1 + math.normalize(v3 - v1) * extent);

        var l1 = noise.snoise(v1 * noiseFrequency + noiseOffset);
        var l2 = noise.snoise(v2 * noiseFrequency + noiseOffset);
        var l3 = noise.snoise(v3 * noiseFrequency + noiseOffset);

        l1 = math.abs(l1 * l1 * l1);
        l2 = math.abs(l2 * l2 * l2);
        l3 = math.abs(l3 * l3 * l3);

        v1 *= 1 + l1 * noiseAmplitude;
        v2 *= 1 + l2 * noiseAmplitude;
        v3 *= 1 + l3 * noiseAmplitude;

        var n = math.cross(v2 - v1, v3 - v1);

        var offs = i * 6;

        buffer[offs++] = v1;
        buffer[offs++] = n;
        buffer[offs++] = v2;
        buffer[offs++] = n;
        buffer[offs++] = v3;
        buffer[offs++] = n;
    }
}

其实主要内容还是跟之前的NoiseBall一样，那么我们主要着眼于不同的部分。
开头写了好几个Attribute：
[BurstCompile(CompileSynchronously = true,FloatMode = FloatMode.Fast, FloatPrecision = FloatPrecision.Low)]
官方文档
这些Attribute就是专门给Job使用，来进行一些参数的设置
第一个CompileSynchronously，正如其名，

Gets or sets whether or not to burst compile the code immediately on first use, or in the background over time.

设置是否在第一次使用的时候就进行Brust编译，设置为false则会在后台慢慢编译

FloatMode = FloatMode.Fast
则是允许“代数等效优化”（algebraically equivalent optimizations），总之就是修改一些算法，在模拟结果的同时对计算进行加速，具体可参照官方文档。

FloatPrecision = FloatPrecision.Low
浮点数精度设置为低，加速计算。

之后便是标注了[ReadOnly]的许多变量

By declaring it as read only, multiple jobs are allowed to access the data in parallel

设置为ReadOnly，则可以让这些数据被并行化地访问。

然后给要写入数据的buffer设置了
[NativeDisableParallelForRestriction]
可惜的是官方文档也没有对这个Attribute进行详细的介绍（甚至根本就是特么没写），只能字面意义上的理解为取消并行化防止数据出错了。

然后创建了一个Random类，使用它的函数

/// <summary>Returns a uniformly random float value in the interval [0, 1).</summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public float NextFloat()
{
    return asfloat(0x3f800000 | (NextState() >> 9)) - 1.0f;
}

生成伪随机，说实话看不懂这个操作……

其他的normalize、snoise等函数也都是出自Unity.Mathematics。

具体的计算内容跟之前就没啥区别。
根据buffer的赋值顺序，我们也可以知道buffer中顶点位置法线信息是交替存储的，而不是先存所有的位置，再存所有的法线。

关于在主线程中调用job.Schedule((int)_triangleCount, 64).Complete();
关于其中的几个参数，官方在IJobParallelFor的注释中如此写到：

Schedule a parallel-for job. First parameter is how many for-each iterations to perform.
The second parameter is the batch size, essentially the no-overhead innerloop that just invokes Execute(i) in a loop.
When there is a lot of work in each iteration then a value of 1 can be sensible.
When there is very little work values of 32 or 64 can make sense.

第一个参数是每个迭代执行多少次。
第二个参数是批处理大小，本质上是只在循环中调用Execute(i)的无开销innerloop。
当在每个迭代中有很多工作时，值1是合理的，当工作很少时，32或64则不错。

关于Schedule的其他参数也可以参考官方文档

这次对Job没有一个总体层面的了解直接开始看了API，感觉还是略蒙蔽。

感谢您阅读完本文，若您认为本文对您有所帮助，可以将其分享给其他人；若您发现文章对您的利益产生侵害，请联系作者进行删除。

NoiseBall2

DrawMeshInstancedIndirect

Compute Shader

NoiseBall 3

DrawProcedural

Shader

NoiseBall 5

Job

FEATURED TAGS