shsolanki commited on
Commit
c8d6c7d
·
1 Parent(s): 5a3a32c

update model cards

Browse files
README.md CHANGED
@@ -1,100 +1,99 @@
1
  ---
2
  language:
3
- - en
4
  license: other
5
  license_name: nvidia-open-model-license
6
  license_link: >-
7
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
8
  tags:
9
- - nvidia
10
- - asset-harvester
11
- - image-to-3d
12
- - 3d-generation
13
- - gaussian-splatting
14
- - physical-ai
15
  pipeline_tag: image-to-3d
16
  ---
17
 
18
  # Asset Harvester | System Model Card
19
 
 
 
20
  ## **Description:**
21
 
22
- Asset Harvester is a system that leverages 4 models (see System Architecture below) to generate three-dimensional (3D) assets from a single image or multiple images of vehicles. [Mask2Former](https://docs.google.com/document/d/1OKMAhNruoLE254xLLdIWULPuwUWGNsbpg36BNUnpTSQ/edit?tab=t.0#heading=h.7axn5fq6ipu5) and [C-RADIO](https://huggingface.co/nvidia/C-RADIO) are used for view extraction from NCore data sessions, the [Multiview Diffusion (Sana-based)](https://docs.google.com/document/d/1y7qU1to8TrV07Tfz3crxJiuA_AL0Wlwwp6C-RW-NoLg/edit?tab=t.0#heading=h.g8ogslbqcx12) is then used to generate 16 multiview images of the input vehicle, and lastly [TokenGS](https://docs.google.com/document/d/1EZWB-had-1MMmrES9bvQlJHpjXQFawR619sX3HvsVpQ/edit?usp=sharing) generates the output 3D asset.
23
 
24
  This system is ready for commercial/non-commercial use
25
 
26
- ### **License/Terms of Use**:
27
 
28
- ### GOVERNING TERMS: Your use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
29
 
30
  **Deployment Geography:** Global
31
 
 
 
 
 
32
  ## **Automation Level:**
33
 
34
- Full Automation
35
 
36
  ## **Use Case:**
37
 
38
- Physical AI developers who are looking to create 3D assets of vehicles for either closed-loop simulation or Synthetic Data Generation (SDG).
39
 
40
  ## **Known Technical Limitations:**
41
 
42
- The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle can generate a poor or hallucinated 3D asset, like the following example:
43
 
44
- ## Known Risk(s):
 
45
 
46
  AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations.
47
 
48
- ##
49
 
50
- **Release Date:** Public Github \[03/12/2026\]
51
 
52
- **Reference(s):** None
53
 
54
  ## **System Architecture**
55
-
56
- **Architecture Diagram:**
57
-
58
- The following models are used by this system:
59
-
60
- * [Mask2Former Model Card](https://docs.google.com/document/d/1OKMAhNruoLE254xLLdIWULPuwUWGNsbpg36BNUnpTSQ/edit?tab=t.0#heading=h.7axn5fq6ipu5)
61
- * [C-RADIO Model Card](https://huggingface.co/nvidia/C-RADIO)
62
- * [Multiview Diffusion (Sana-based) Model Card](https://docs.google.com/document/d/1y7qU1to8TrV07Tfz3crxJiuA_AL0Wlwwp6C-RW-NoLg/edit?tab=t.0#heading=h.g8ogslbqcx12)
63
- * [TokenGS Model Card](https://docs.google.com/document/d/1EZWB-had-1MMmrES9bvQlJHpjXQFawR619sX3HvsVpQ/edit?usp=sharing)
64
 
65
  ## **System Input:**
66
 
67
- **Input Type(s):** 1 or more images (up until 4\)
68
- **Input Format:** Red, Green, Blue (RGB)
69
- **Input Parameters:** Two-Dimensional (2D)
70
- **Other Properties Related to Input:**
71
 
72
- We currently accept up to 4 input images for each object. The resolution of the images are 512x512. The input images are extracted from NVIDIA's NCore data along w/ other metadata needed for downstream processing:
73
 
74
- * Camera orientation of each image
75
- * Camera distance of each image
76
- * Camera field of view of each image
77
  * Bounding box dimensions of each object
78
 
79
  ## **System Output:**
80
 
81
- **Output Type(s):** Corresponding 3D Gaussian asset to the object in input images
82
- **Output Format:** Polygon File Format (PLY)
83
- **Output Parameters:** Three-Dimensional (3D)
84
- **Other Properties Related to Output:**
85
 
86
- A [PLY file](https://en.wikipedia.org/wiki/PLY_(file_format)#:~:text=PLY%20is%20a%20computer%20file,dimensional%20data%20from%203D%20scanners.) (3D Gaussian Splatting, 3DGS) contains 3D object data with the following specific components:
87
 
88
- * **Header**: Defines the file structure, including format (ASCII or binary), Gaussian elements, their properties (e.g., position, appearance coefficients, opacity, scale, rotation), and data types (e.g., float, int).
89
- * **Gaussian Data**: Stores the parameters of each 3D Gaussian, including its center position (`x, y, z`), and optionally properties such as normals (`nx, ny, nz`), color or spherical harmonics coefficients (`f_dc_0, f_dc_1, f_dc_2`, and higher-order terms), opacity, anisotropic scale, and rotation.
90
 
91
  ## **Hardware Compatibility:**
92
 
93
  **Supported Hardware Microarchitecture Compatibility:**
94
 
95
- * NVIDIA Ampere
96
- * NVIDIA Blackwell
97
- * NVIDIA Hopper
98
  * NVIDIA Lovelace
99
 
100
  **Preferred/Supported Operating Systems:** Linux
@@ -103,14 +102,14 @@ A [PLY file](https://en.wikipedia.org/wiki/PLY_(file_format)#:~:text=PLY%20is%20
103
 
104
  The systems can run on a single GPU with an Nvidia GPU with CUDA Compute Capability greater than or equal to 8.0. The following is required:
105
 
106
- * GPU performance \>= 300 Tflops
107
- * GPU memory size \>= 30GB
108
- * GPU memory bandwidth \>= 768 GB/s
109
- * System RAM \>= 32 GB
110
- * System disk storage \>= 100GB
111
  * CPU \>= 16 threads x 3GHz
112
 
113
- ##
114
 
115
  ## **System Version:**
116
 
@@ -118,7 +117,7 @@ Asset\_Harvester\_GA
118
 
119
  ## **Inference:**
120
 
121
- **Engine:** Pytorch
122
  **Test Hardware:** A100, H100
123
 
124
  ## **Ethical Considerations:**
@@ -131,30 +130,30 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
131
 
132
  ## Model Card++
133
 
134
- ### Bias
135
 
136
  | Field | Response |
137
  | :---- | :---- |
138
  | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
139
  | Measures taken to mitigate against unwanted bias: | None |
140
 
141
- ### Explainability
142
 
143
  | Field | Response |
144
  | :---- | :---- |
145
- | Intended Domain | Advanced Driver Assistance Systems |
146
  | Model Type: | Image-to-3D Asset |
147
  | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
148
  | Output | 3D Asset |
149
- | Describe how the model works | The system takes as an input an image, and outputs a corresponding 3D asset |
150
  | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
151
- | Technical Limitations | The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle can generate a poor or hallucinated 3D asset |
152
  | Verified to have met prescribed NVIDIA quality standards | Yes |
153
  | Performance Metrics | PSNR (Peak Signal-to-Noise Ratio) |
154
  | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
155
- | Licensing | The use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
156
 
157
- ### Privacy
158
 
159
  | Field | Response |
160
  | :---- | :---- |
@@ -172,17 +171,11 @@ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.
172
  | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
173
  | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
174
 
175
- ### Safety & Security
176
 
177
  | Field | Response |
178
  | :---- | :---- |
179
  | Model Application(s): | 3D Asset Generation |
180
  | Describe the life critical impact (if present). | N/A \- The system should not be deployed in a vehicle to perform life-critical tasks. |
181
- | Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
182
  | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training |
183
-
184
- [image1]: images/image1.png
185
-
186
- [image2]: images/image2.png
187
-
188
- [image3]: images/image3.png
 
1
  ---
2
  language:
3
+ - en
4
  license: other
5
  license_name: nvidia-open-model-license
6
  license_link: >-
7
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license
8
  tags:
9
+ - nvidia
10
+ - asset-harvester
11
+ - image-to-3d
12
+ - 3d-generation
13
+ - gaussian-splatting
14
+ - physical-ai
15
  pipeline_tag: image-to-3d
16
  ---
17
 
18
  # Asset Harvester | System Model Card
19
 
20
+ ### [Paper (coming soon)]() | [Project Page (coming soon)](https://research.nvidia.com/labs/sil/asset-harvester) | [Code](https://github.com/NVIDIA/asset-harvester) | [Model](https://huggingface.co/nvidia/asset-harvester) | [Data](https://huggingface.co/datasets/nvidia/PhysicalAI-Autonomous-Vehicles-NCore)
21
+
22
  ## **Description:**
23
 
24
+ **Asset Harvester** is a system that leverages 4 models (see the white paper for architecture) to generate 3D assets from a single image or multiple images of vehicles or VRUs. The [AV object Mask2former]() instance segmentation model is used for image processing when parsing input views from NCore data sessions. The input images are encoded by [C-Radio](https://huggingface.co/nvidia/C-RADIO), and the multiview diffusion model, [SparseViewDiT](), is then used to generate 16 multiview images of the input objects, and lastly an [Object TokenGS]() lifts the images to a 3D asset.
25
 
26
  This system is ready for commercial/non-commercial use
27
 
28
+ ### **License/Terms of Use**:
29
 
30
+ ### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
31
 
32
  **Deployment Geography:** Global
33
 
34
+ ### **Release Management:**
35
+
36
+ This system is exposed as a collection of models on [HuggingFace](https://huggingface.co/nvidia/asset-harvester) and inference scripts on [Github](https://github.com/NVIDIA/asset-harvester).
37
+
38
  ## **Automation Level:**
39
 
40
+ Partial Automation
41
 
42
  ## **Use Case:**
43
 
44
+ Physical AI developers who are looking to create 3D assets of vehicles or VRUs for either closed-loop simulation or Synthetic Data Generation (SDG).
45
 
46
  ## **Known Technical Limitations:**
47
 
48
+ The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle can generate a poor or hallucinated 3D asset.
49
 
50
+
51
+ ## Known Risk(s):
52
 
53
  AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations.
54
 
55
+ ##
56
 
57
+ **Reference(s):**
58
 
59
+ [Asset Harvester: Turning Autonomous Driving Logs into 3D Assets for Simulation]()
60
 
61
  ## **System Architecture**
62
+ System architecture details described in white paper above.
 
 
 
 
 
 
 
 
63
 
64
  ## **System Input:**
65
 
66
+ **Input Type(s):** 1 or more images (up until 4\)
67
+ **Input Format:** Red, Green, Blue (RGB)
68
+ **Input Parameters:** Two-Dimensional (2D)
69
+ **Other Properties Related to Input:**
70
 
71
+ We currently accept up to 4 input images for each object. The resolution of the images are 512x512. The input images are extracted from NVIDIAs NCore data along w/ other metadata needed for downstream processing:
72
 
73
+ * Camera orientation of each image
74
+ * Camera distance of each image
75
+ * Camera field of view of each image
76
  * Bounding box dimensions of each object
77
 
78
  ## **System Output:**
79
 
80
+ **Output Type(s):** Corresponding 3D Gaussian asset to the object in input images
81
+ **Output Format:** Polygon File Format (PLY)
82
+ **Output Parameters:** Three-Dimensional (3D)
83
+ **Other Properties Related to Output:**
84
 
85
+ A PLY file (3D Gaussian Splatting, 3DGS) contains 3D object data with the following specific components:
86
 
87
+ * **Header**: Defines the file structure, including format (ASCII or binary), Gaussian elements, their properties (e.g., position, appearance coefficients, opacity, scale, rotation), and data types (e.g., float, int).
88
+ * **Gaussian Data**: Stores the parameters of each 3D Gaussian as vertex elements: center position (`x`, `y`, `z`), spherical harmonics DC coefficients (`f_dc_0`, `f_dc_1`, `f_dc_2`), `opacity`, anisotropic scale (`scale_0`, `scale_1`, `scale_2`), and rotation quaternion (`rot_0`, `rot_1`, `rot_2`, `rot_3`).
89
 
90
  ## **Hardware Compatibility:**
91
 
92
  **Supported Hardware Microarchitecture Compatibility:**
93
 
94
+ * NVIDIA Ampere
95
+ * NVIDIA Blackwell
96
+ * NVIDIA Hopper
97
  * NVIDIA Lovelace
98
 
99
  **Preferred/Supported Operating Systems:** Linux
 
102
 
103
  The systems can run on a single GPU with an Nvidia GPU with CUDA Compute Capability greater than or equal to 8.0. The following is required:
104
 
105
+ * GPU performance \>= 300 Tflops
106
+ * GPU memory size \>= 30GB
107
+ * GPU memory bandwidth \>= 768 GB/s
108
+ * System RAM \>= 32 GB
109
+ * System disk storage \>= 100GB
110
  * CPU \>= 16 threads x 3GHz
111
 
112
+
113
 
114
  ## **System Version:**
115
 
 
117
 
118
  ## **Inference:**
119
 
120
+ **Engine:** Pytorch
121
  **Test Hardware:** A100, H100
122
 
123
  ## **Ethical Considerations:**
 
130
 
131
  ## Model Card++
132
 
133
+ **Bias**
134
 
135
  | Field | Response |
136
  | :---- | :---- |
137
  | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
138
  | Measures taken to mitigate against unwanted bias: | None |
139
 
140
+ **Explainability**
141
 
142
  | Field | Response |
143
  | :---- | :---- |
144
+ | Intended Domain | Autonomous Driving Simulation |
145
  | Model Type: | Image-to-3D Asset |
146
  | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
147
  | Output | 3D Asset |
148
+ | Describe how the model works | The system takes as an input one or few images, and outputs a corresponding 3D asset |
149
  | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
150
+ | Technical Limitations | The system is not guaranteed to perform well with occluded objects or objects that are outside of the common distribution. For example, a heavily occluded vehicle image can generate a poor or hallucinated 3D asset |
151
  | Verified to have met prescribed NVIDIA quality standards | Yes |
152
  | Performance Metrics | PSNR (Peak Signal-to-Noise Ratio) |
153
  | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
154
+ | Licensing | Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
155
 
156
+ **Privacy**
157
 
158
  | Field | Response |
159
  | :---- | :---- |
 
171
  | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
172
  | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
173
 
174
+ **Safety & Security**
175
 
176
  | Field | Response |
177
  | :---- | :---- |
178
  | Model Application(s): | 3D Asset Generation |
179
  | Describe the life critical impact (if present). | N/A \- The system should not be deployed in a vehicle to perform life-critical tasks. |
180
+ | Use Case Restrictions: | Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
181
  | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training |
 
 
 
 
 
 
model_cards/Mask2Former.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Mask2Former Overview | Model Card
2
+
3
+ ## **Description:**
4
+
5
+ Mask2Former is a universal model that performs object detection and instance segmentation tasks.
6
+
7
+ This model is used in the Asset Harvester System.
8
+
9
+ ### **License/Terms of Use:**
10
+
11
+ GOVERNING TERMS: The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
12
+
13
+ ### **Deployment Geography:**
14
+
15
+ Global
16
+
17
+ ### **Use Case:**
18
+
19
+ Mask2Former can be used for image segmentation across panoptic, instance, and semantic tasks, enabling detailed scene understanding without requiring task-specific customization. It also excels in processing complex images, such as autonomous driving scenes or crowded environments, providing accurate object boundaries, semantic labels, and instance differentiation in a single unified output.
20
+
21
+ ### **Release Date:**
22
+
23
+ HuggingFace 03/16/26
24
+
25
+ ## **Reference:**
26
+
27
+ [Bowen Cheng](https://arxiv.org/search/cs?searchtype=author&query=Cheng,+B), [Ishan Misra](https://arxiv.org/search/cs?searchtype=author&query=Misra,+I), [Alexander G. Schwing](https://arxiv.org/search/cs?searchtype=author&query=Schwing,+A+G), [Alexander Kirillov](https://arxiv.org/search/cs?searchtype=author&query=Kirillov,+A), [Rohit Girdhar](https://arxiv.org/search/cs?searchtype=author&query=Girdhar,+R), Masked-attention Mask Transformer for Universal Image Segmentation, [https://arxiv.org/abs/2112.01527](https://arxiv.org/abs/2112.01527).
28
+
29
+ ## **Model Architecture:**
30
+
31
+ * Fully Convolutional Networks (FCNs) + Transformer
32
+
33
+ ## **Input:**
34
+
35
+ * **Input Type(s):** Image
36
+ * **Input Format(s):** Red, Green, Blue (RGB)
37
+ * **Input Parameters:** The input parameters to this model are 2D query features (X0) and 3D image features (Kl, Vl) with dimensions N x C, where N is the number of query features and C is the number of channels.
38
+ * **Other Properties Related to Input:** Spatial resolution of image features: 32, 16, 8.
39
+
40
+ ## **Output:**
41
+
42
+ * **Output Type(s):** Image
43
+ * **Output Format(s):** Binary mask
44
+ * **Output Parameters:** The output parameters of this model are the predicted mask for each query, with dimensions of the input query features being N x C, where N is the number of query features and C is the number of channels.
45
+ * **Other Properties Related to Output:** Resolution: H1=H=32, H2=H=16, H3=H=8 and W1=W=32, W2=W=16
46
+
47
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
48
+
49
+ ## **Software Integration:**
50
+
51
+ **Runtime Engine(s):**
52
+ PyTorch
53
+
54
+ **Supported Hardware Microarchitecture Compatibility:**
55
+
56
+ * NVIDIA Ampere
57
+ * NVIDIA Blackwell
58
+ * NVIDIA Hopper
59
+ * NVIDIA Lovelace
60
+
61
+ **[Preferred/Supported] Operating System(s):**
62
+ Linux
63
+
64
+ ## **Model Version(s):**
65
+
66
+ V1
67
+
68
+ ## **Training, Testing, and Evaluation Datasets:**
69
+
70
+ Mask2Former was trained, tested, and evaluated using an internal NV AV dataset.
71
+
72
+ | Dataset names | Size and content | Training partition | Test partition |
73
+ | :---- | :---- | :---- | :---- |
74
+ | Internal Nvidia AV dataset | Posed images of 278k objects | 83% (cross validation) | 17% |
75
+
76
+ ### Internal NVIDIA AV dataset
77
+
78
+ **Link:** N/A
79
+
80
+ **Data Collection Method:** Sensors
81
+
82
+ **Labeling Method by Dataset:** Automated. The labels we collected are binary masks of objects in the images.
83
+
84
+ **Properties**: This dataset was collected using sensors mounted on the NVIDIA fleet and was auto-labeled using a third party tool to ensure high-quality annotations.
85
+
86
+ ## **Inference:**
87
+
88
+ **Engine:**
89
+ PyTorch
90
+
91
+ **Test Hardware:**
92
+ A6000
93
+
94
+ ## **Ethical Considerations:**
95
+
96
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
97
+
98
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
99
+
100
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
101
+
102
+ **Bias**
103
+
104
+ | Field | Response |
105
+ | :---- | :---- |
106
+ | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
107
+ | Measures taken to mitigate against unwanted bias: | None |
108
+
109
+ **Explainability**
110
+
111
+ | Field | Response |
112
+ | :---- | :---- |
113
+ | Intended Domain | Advanced Driver Assistance Systems |
114
+ | Model Type: | Object detection and Instance segmentation |
115
+ | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
116
+ | Output | Image Segmentation |
117
+ | Describe how the model works | The model takes as an input an image, and outputs a segmentation mask of the image |
118
+ | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
119
+ | Technical Limitations | The system does not guarantee a 100% success rate. The model was trained mostly on vehicles and would not perform well on pedestrians, cyclists, or other non-vehicular objects and struggles with small objects |
120
+ | Verified to have met prescribed NVIDIA quality standards | Yes |
121
+ | Performance Metrics | Intersection over Union (IOU) |
122
+ | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
123
+ | Licensing | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
124
+
125
+ **Privacy**
126
+
127
+ | Field | Response |
128
+ | :---- | :---- |
129
+ | Generatable or reverse engineerable personal data? | No |
130
+ | Personal data used to create this model? | No |
131
+ | How often is the dataset reviewed? | Before release |
132
+ | Is there provenance for all datasets used in training? | Yes |
133
+ | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
134
+ | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
135
+ | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
136
+
137
+ **Safety & Security**
138
+
139
+ | Field | Response |
140
+ | :---- | :---- |
141
+ | Model Application(s): | Object detection and Segmentation |
142
+ | Describe the life critical impact (if present). | N/A \- The model should not be deployed in a vehicle to perform life-critical tasks. |
143
+ | Use Case Restrictions: | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
144
+ | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
model_cards/MultviewDiffusion.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Multiview Diffusion (Sana-based) | Model Card
2
+
3
+ ## **Description:**
4
+
5
+ The multiview diffusion model was trained on AV object images with a SANA base model. The model is conditioned on image input and outputs images of the same object in different viewpoints. It doesn't support text input.
6
+
7
+ This model is used as part of the Asset Harvester GA.
8
+
9
+ ### **License/Terms of Use:**
10
+
11
+ ### Governing Terms: Use of this model system is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) .
12
+
13
+ ### **Deployment Geography:**
14
+
15
+ Global
16
+
17
+ ### **Use Case:**
18
+
19
+ The multiview diffusion model takes a set of posed images as input and outputs 16 images from different viewpoints of the same input vehicle. The goal of it is to provide the 16 output images as input for three-dimensional (3D) reconstruction to generate 3D assets.
20
+
21
+ ### **Release Date:**
22
+
23
+ HuggingFace
24
+
25
+ ## **Reference(s):**
26
+
27
+ **Asset-Harvester: Turning Autonomous Driving Logs into 3D Assets for Simulation.** *NVIDIA white paper.*
28
+ \[later we replace it with our paper link\]
29
+
30
+ ## **Model Architecture:**
31
+
32
+ **Architecture Type:** Linear Diffusion Transformer
33
+
34
+ **Network Architecture:** Linear-attention Diffusion Transformer with a Deep Compression Autoencoder (DC-AE) for efficient high-resolution image generation. C-RADIO for image conditioning signal.
35
+
36
+ ## **Input:**
37
+
38
+ **Input Type(s):** Up to 4 Images (Adjustable via config parameter)
39
+
40
+ **Input Format(s):** Red, Green, Blue (RGB)
41
+
42
+ **Input Parameters:** Two-Dimensional (2D)
43
+
44
+ **Other Properties Related to Input:** Camera matrices of images
45
+
46
+ ## **Output:**
47
+
48
+ **Output Type(s):** 16 Images
49
+
50
+ **Output Format(s):** Red, Green, Blue (RGB)
51
+
52
+ **Output Parameters:** Two-Dimensional (2D)
53
+
54
+ **Other Properties Related to Output:** Camera poses of images
55
+
56
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
57
+
58
+ ## **Software Integration:**
59
+
60
+ **Runtime Engine(s):**
61
+ PyTorch
62
+
63
+ **Supported Hardware Microarchitecture Compatibility:**
64
+ NVIDIA Ampere
65
+
66
+ **[Preferred/Supported] Operating System(s):**
67
+ Linux
68
+
69
+ ## **Model Version(s):**
70
+
71
+ v1
72
+
73
+ ## **Training, Testing, and Evaluation Datasets:**
74
+
75
+ The model was trained, tested, and finetuned using an Objaverse subset internal AV data, and Omniverse 3D assets (synthetic images).
76
+
77
+ | Dataset names | Size and content | Training partition | Test partition |
78
+ | :---- | :---- | :---- | :---- |
79
+ | Internal Nvidia AV dataset | Posed images of 278k objects | 83% (cross validation) | 17% |
80
+ | Omniverse 3D assets | 200 3D assets of objects | 100% | 0% |
81
+ | Objaverse | 80k assets collected under commercially viable Creative Commons licenses, | 100% | 0% |
82
+
83
+ ### Objaverse Commercially Viable Subset
84
+
85
+ **Link:** https://objaverse.allenai.org
86
+ **Data Collection Method:** Synthetic 3D assets aggregated from various open-source and licensed sources
87
+ **Labeling Method by Dataset:** Hybrid: Human and Automated
88
+ **Properties:** This dataset consists of a diverse set of over 80,000 synthetic 3D object models spanning everyday items, animals, tools, and complex structures. Each model is rendered into multi-view 2D images with associated camera poses, materials, and mesh properties.
89
+
90
+ ### Internal NVIDIA AV dataset
91
+
92
+ **Data Collection Method:** Sensors
93
+
94
+ **Labeling Method by Dataset:** Human
95
+
96
+ **Properties**: This dataset was collected using sensors mounted on the NVIDIA fleet and was manually labeled by a team of human annotators to ensure high-quality annotations.
97
+
98
+ ### Omniverse 3D assets
99
+
100
+ **Data Collection Method:** Human
101
+
102
+ **Labeling Method by Dataset:** Human
103
+
104
+ **Properties**: This dataset was collected using humans that create 3D assets.
105
+
106
+ ## **Inference:**
107
+
108
+ **Engine:** PyTorch>=2.0.0
109
+
110
+ **Test Hardware:**
111
+ We tested on H100, A100, A6000 and RTX4090. Inference time using 1XA100 is 7 seconds per 16 images.
112
+
113
+ ## **Ethical Considerations:**
114
+
115
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
116
+
117
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
118
+
119
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
120
+
121
+ **Bias**
122
+
123
+ | Field | Response |
124
+ | :---- | :---- |
125
+ | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
126
+ | Measures taken to mitigate against unwanted bias: | None |
127
+
128
+ **Explainability**
129
+
130
+ | Field | Response |
131
+ | :---- | :---- |
132
+ | Intended Domain | Advanced Driver Assistance Systems |
133
+ | Model Type: | Multiview creation |
134
+ | Intended Users: | Autonomous Vehicles developers enhancing and improving Neural Reconstruction pipelines. |
135
+ | Output | 16 images |
136
+ | Describe how the model works | The model takes as an input an image (up to 4\) and outputs 16 multiviews of the vehicles detected in the original image |
137
+ | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
138
+ | Technical Limitations | The system does not guarantee a 100% success rate. It cannot fully guarantee the safety and controllability of the generated image content. Additionally, challenges remain in certain complex cases, such as text rendering and the generation of faces and hands. |
139
+ | Verified to have met prescribed NVIDIA quality standards | Yes |
140
+ | Performance Metrics | Peak signal-to-noise ratio (PSNR), FID (Frechet Inception Distance), CLIPScore |
141
+ | Potential Known Risks | AV and robotics developers should be aware that this model cannot guarantee a 100% success rate. In cases of unsuccessful generation, the output may not possess an accurate real-world representation of the asset and should not be relied upon in safety-critical simulations. |
142
+ | Licensing | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
143
+
144
+ **Privacy**
145
+
146
+ | Field | Response |
147
+ | :---- | :---- |
148
+ | Generatable or reverse engineerable personal data? | No |
149
+ | Personal data used to create this model? | Yes |
150
+ | Was consent obtained for any personal data used? | Yes |
151
+ | Is a mechanism in place to honor data subject right of access or deletion of personal data? | Yes |
152
+ | If personal data was collected for the development of the model, was it collected directly by NVIDIA? | No |
153
+ | If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | N/A |
154
+ | If personal data was collected for the development of this AI model, was it minimized to only what was required? | Yes |
155
+ | Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | Yes |
156
+ | How often is the dataset reviewed? | Before release |
157
+ | Is there provenance for all datasets used in training? | Yes |
158
+ | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
159
+ | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
160
+ | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
161
+
162
+ **Safety & Security**
163
+
164
+ | Field | Response |
165
+ | :---- | :---- |
166
+ | Model Application(s): | Multiview creation |
167
+ | Describe the life critical impact (if present). | N/A \- The model should not be deployed in a vehicle to perform life-critical tasks. |
168
+ | Use Case Restrictions: | The use of the model is governed by the [NVIDIA Software and Model Evaluation License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
169
+ | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
model_cards/TokenGS.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Object TokenGS | Model Card
2
+
3
+ ## **Description:**
4
+
5
+ The Object TokenGS is a feed-forward neural reconstruction model that takes posed multi-view RGB images as input and predicts a 3D Gaussian Splatting (3DGS) representation for the object.
6
+ TokenGS directly regresses 3D Gaussian centers in global coordinates and decouples the number of predicted Gaussians from input image resolution and number of views by using learnable Gaussian tokens in an encoder-decoder Transformer.
7
+
8
+ ### **License/Terms of Use:**
9
+ The model is a submodule that follows the terms of [Asset Havester](https://huggingface.co/nvidia/asset-harvester),
10
+
11
+ ### **Deployment Geography:**
12
+
13
+ Global
14
+
15
+ ### **Use Case:**
16
+
17
+ Object TokenGS can be used for multi-view 3D object lifting. It takes multiview images as input, and convert them into 3D Gaussian assets.
18
+
19
+ ### **Release Date:**
20
+
21
+ Github 03/16/2026 via [github.com/NVIDIA/asset-harvester](https://github.com/NVIDIA/asset-harvester)
22
+ Hugging Face 03/16/2026 via [huggingface.co/nvidia/asset-harvester](https://huggingface.co/nvidia/asset-harvester)
23
+
24
+ ## **References(s):**
25
+
26
+ - [Asset-Harvester: Turning Autonomous Driving Logs into 3D Assets for Simulation. ]()
27
+
28
+ ## **Model Architecture:**
29
+
30
+ System architecture details described in white paper above.
31
+
32
+ ## **Input:**
33
+
34
+ **Input Type(s):** Image
35
+ **Input Format(s):** Red, Green, Blue (RGB) images plus camera parameters
36
+ **Input Parameters:** Two-Dimensional (2D) images with camera intrinsics and extrinsics; optional timestamp conditioning for dynamic reconstruction
37
+ **Other Properties Related to Input:**
38
+
39
+ - Input includes camera intrinsics and camera extrinsics.
40
+ - Images with resolution `512 x 512`
41
+
42
+ ## **Output:**
43
+
44
+ **Output Type(s):** 3D Gaussian Splatting primitives and rendered RGB images
45
+ **Output Format(s):** 3DGS parameter tensors (14 attributes per Gaussian primitive) renderable to novel RGB views via a differentiable Gaussian splatting renderer
46
+ **Output Parameters:** 14-dimensional (14D) Gaussian attributes
47
+ **Other Properties Related to Output:**
48
+
49
+ Each Gaussian includes:
50
+
51
+ - Mean or center: `(x, y, z)`
52
+ - Color: `(r, g, b)`
53
+ - Scale: `(sx, sy, sz)`
54
+ - Opacity: `alpha`
55
+ - Rotation: quaternion `(qw, qx, qy, qz)`
56
+
57
+ Our AI models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA hardware and CUDA-enabled software frameworks, the model achieves faster training and inference times compared to CPU-only solutions.
58
+
59
+ ## **Software Integration:**
60
+
61
+ **Supported Hardware Microarchitecture Compatibility:**
62
+
63
+ - NVIDIA Ampere
64
+ - NVIDIA Blackwell
65
+ - NVIDIA Hopper
66
+ - NVIDIA Lovelace
67
+
68
+
69
+ **Supported Operating System(s):**
70
+
71
+ - Linux
72
+
73
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
74
+
75
+ ## **Model Version:**
76
+
77
+ Asset\_Harvester\_GA
78
+
79
+ ## **Training, Testing, and Evaluation Datasets:**
80
+
81
+ as described in the white paper.
82
+
83
+
84
+ ## **Inference:**
85
+
86
+ **Acceleration Engine:** PyTorch
87
+ **Test Hardware:** NVIDIA A100, H100
88
+
89
+ ## **Ethical Considerations:**
90
+
91
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with the license terms, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
92
+
93
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
94
+
95
+ Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the generated image or video will not automatically blur or maintain the proportions of image subjects included.
96
+
97
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
98
+
99
+ **Bias**
100
+
101
+ | Field | Response |
102
+ | :---- | :---- |
103
+ | Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
104
+ | Measures taken to mitigate against unwanted bias: | None |
105
+
106
+ **Explainability**
107
+
108
+ | Field | Response |
109
+ | :---- | :---- |
110
+ | Intended Task/Domain: | Multi-view 3D object reconstruction. |
111
+ | Model Type: | Transformer |
112
+ | Intended Users: | 3D vision, simulation, graphics, and robotics or physical AI researchers and developers. |
113
+ | Output | 3D Gaussian Splat representation and rendered novel views. |
114
+ | Describe how the model works | Encoder-decoder Transformer with learnable Gaussian tokens directly regresses 3D Gaussian attributes from posed images, trained with rendering and visibility losses. |
115
+ | Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of | None |
116
+ | Technical Limitations & Mitigation | TokenGS may miss fine-grained geometric details. Quality depends on camera pose quality and multiview coverage, so users should validate outputs and provide sufficient view diversity and accurate camera metadata. |
117
+ | Verified to have met prescribed NVIDIA quality standards | Yes |
118
+ | Performance Metrics | PSNR, SSIM, LPIPS; additional comparisons under view extrapolation and camera-noise robustness. |
119
+ | Potential Known Risks | Reconstruction failures or incomplete geometry may produce misleading renderings or assets. |
120
+ | Licensing | The use of the model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
121
+
122
+ **Privacy**
123
+
124
+ | Field | Response |
125
+ | :---- | :---- |
126
+ | Generatable or reverse engineerable personal data? | No |
127
+ | Personal data used to create this model? | No |
128
+ | Was consent obtained for any personal data used? | Not Applicable |
129
+ | How often is the dataset reviewed? | Before release |
130
+ | Is a mechanism in place to honor data subject right of access or deletion of personal data? | Not Applicable |
131
+ | If personal data was collected for the development of the model, was it collected directly by NVIDIA? | Not Applicable |
132
+ | If personal data was collected for the development of the model by NVIDIA, do you maintain or have access to disclosures made to data subjects? | Not Applicable |
133
+ | If personal data was collected for the development of this AI model, was it minimized to only what was required? | Not Applicable |
134
+ | Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No |
135
+ | Is there provenance for all datasets used in training? | Yes |
136
+ | Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
137
+ | Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes |
138
+ | Applicable Privacy Policy | [https://www.nvidia.com/en-us/about-nvidia/privacy-policy/](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
139
+
140
+ **Safety & Security**
141
+
142
+ | Field | Response |
143
+ | :---- | :---- |
144
+ | Model Application(s): | 3D object reconstruction|
145
+ | Describe the life critical impact (if present). | Not Applicable. The model is not intended for direct life-critical decision-making, and outputs should not be used as the sole basis for autonomous vehicle perception, robotics control, or operational safety decisions. Additional validation and testing should be incorporated prior to deployment in real-world production. |
146
+ | Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
147
+ | Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training |