Download OpenAPI specification:Download
To enable programmatic control over its objects, the Dataprep Platform supports a range of REST API endpoints across its objects. This section provides an overview of the API design, methods, and supported use cases.
Most of the endpoints accept JSON
as input and return JSON
responses.
This means that you must usually add the following headers to your request:
Content-type: application/json
Accept: application/json
ℹ️ NOTE: Access to APIs must be enabled on a per-project basis. For more information, see enable Api
Version: 10.1.0+2986503.20230707151310.f990a1da
The term resource
refers to a single type of object in the Dataprep Platform metadata. An API is broken up by its endpoint's corresponding resource.
The name of a resource is typically plural, and expressed in camelCase. Example: jobGroups
.
Resource names are used as part of endpoint URLs, as well as in API parameters and responses.
The platform supports Create, Read, Update, and Delete operations on most resources. You can review the standards for these operations and their standard parameters below.
Some endpoints have special behavior as exceptions.
To create a resource, you typically submit an HTTP POST
request with the resource's required metadata in the request body.
The response returns a 201 Created
response code upon success with the resource's metadata, including its internal id
, in the response body.
An HTTP GET
request can be used to read a resource or to list a number of resources.
A resource's id
can be submitted in the request parameters to read a specific resource.
The response usually returns a 200 OK
response code upon success, with the resource's metadata in the response body.
If a GET
request does not include a specific resource id
, it is treated as a list request.
The response usually returns a 200 OK
response code upon success, with an object containing a list of resources' metadata in the response body.
When reading resources, some common query parameters are usually available. e.g.:
/v4/jobGroups?limit=100&includeDeleted=true&embed=jobs
Query Parameter | Type | Description |
---|---|---|
embed | string | Comma-separated list of objects to include part of the response. See Embedding resources. |
includeDeleted | string | If set to true , response includes deleted objects. |
limit | integer | Maximum number of objects to fetch. Usually 25 by default |
offset | integer | Offset after which to start returning objects. For use with limit query parameter. |
Updating a resource requires the resource id
, and is typically done using an HTTP PUT
or PATCH
request, with the fields to modify in the request body.
The response usually returns a 200 OK
response code upon success, with minimal information about the modified resource in the response body.
Deleting a resource requires the resource id
and is typically executing via an HTTP DELETE
request. The response usually returns a 204 No Content
response code upon success.
Resource names are plural and expressed in camelCase.
Resource names are consistent between main URL and URL parameter.
Parameter lists are consistently enveloped in the following manner:
{ "data": [{ ... }] }
Field names are in camelCase and are consistent with the resource name in the URL or with the embed URL parameter.
"creator": { "id": 1 },
"updater": { "id": 2 },
When reading a resource, the platform supports an embed
query parameter for most resources, which allows the caller to ask for associated resources in the response.
Use of this parameter requires knowledge of how different resources are related to each other and is suggested for advanced users only.
In the following example, the sub-jobs of a jobGroup are embedded in the response for jobGroup=1:
https://api.clouddataprep.com/v4/jobGroups/1?embed=jobs
If you provide an invalid embedding, you will get an error message. The response will contain the list of possible resources that can be embedded. e.g.
https://api.clouddataprep.com/v4/jobGroups/1?embed=*
Example error:
{
"exception": {
"name": "ValidationFailed",
"message": "Input validation failed",
"details": "No association * in flows! Valid associations are creator, updater, snapshots..."
}
}
It is possible to let the application know that you need fewer data to improve the performance of the endpoints using the fields
query parameter. e.g.
https://api.clouddataprep.com/v4/flows?fields=id;name
The list of fields need to be separated by semi-colons ;
. Note that the application might sometimes return more fields than requested.
You can also use it while embedding resources.
https://api.clouddataprep.com/v4/flows?fields=id;name&embed=flownodes(fields=id)
You can limit and sort the number of embedded resources for some associations. e.g.
https://api.clouddataprep.com/v4/flows?fields=id&embed=flownodes(limit=1,fields=id,sort=-id)
Note that not all association support this. An error is returned when it is not possible to limit the number of embedded results.
The Dataprep Platform uses HTTP response codes to indicate the success or failure of an API request.
HTTP Status Code (client errors) | Notes |
---|---|
400 Bad Request | Potential reasons:
|
403 Forbidden | Incorrect permissions to access the Resource. |
404 Not Found | Resource cannot be found. |
410 Gone | Resource has been previously deleted. |
415 Unsupported Media Type | Incorrect Accept or Content-type header |
Each request has a request identifier, which can be found in the response headers, in the following form:
x-trifacta-request-id: <myRequestId>
ℹ️ NOTE: If you have an issue with a specific request, please include the
x-trifacta-request-id
value when you contact support
The Dataprep Platform applies a per-minute limit to the number of request received by the API for some endpoints.
Users who send too many requests receive a HTTP status code 429
error response.
For applicable endpoints, the quota is documented under the endpoint description.
Treat these limits as maximums and don't try to generate unnecessary load. Notes:
In case you need to trigger many requests on short interval, you can watch for the 429
status code and build a retry mechanism.
The retry mechanism should follow an exponential backoff schedule to reduce request volume. Adding some randomness to the backoff schedule is recommended.
For endpoints which are subject to low rate-limits, response headers will be included in the request and indicate how many requests are left for the current interval. You can use these to avoid blindly retrying.
Example response headers for an endpoint limited to 30 requests/user/min and 60 requests/workspace/min
Header name | Description |
---|---|
x-rate-limit-user-limit |
The maximum number of requests you're permitted to make per user per minute (e.g. 30 ) |
x-rate-limit-user-remaining |
The number of requests remaining in the current rate limit window. (e.g. 28 ) |
x-rate-limit-user-reset |
The time at which the current rate limit window resets in UTC epoch seconds (e.g. 1631095033096 ) |
x-rate-limit-workspace-limit |
The maximum number of requests you're permitted to make per workspace per minute (e.g. 60 ) |
x-rate-limit-workspace-remaining |
The number of requests remaining in the current rate limit window. (e.g. 38 ) |
x-rate-limit-workspace-reset |
The time at which the current rate limit window resets in UTC epoch milliseconds (e.g. 1631095033096 ) |
x-retry-after |
Number of seconds until the current rate limit window resets (e.g. 42 ) |
If you exceed the rate limit, an error response is returned:
curl -i -X POST 'https://api.clouddataprep.com/v4/jobGroups' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{ "wrangledDataset": { "id": "<recipe-id>" } }'
HTTP/1.1 429 Too Many Requests
x-rate-limit-user-limit: 30
x-rate-limit-user-remaining: 0
x-rate-limit-user-reset: 1631096271696
x-retry-after: 57
{
"exception": {
"name": "TooManyRequestsException",
"message": "Too Many Requests",
"details": "API quota reached for \"runJobGroup\". Wait 57 seconds before making a new request. (Max. 30 requests allowed per minute per user.)"
}
}
You can use a third party client, such as curl, HTTPie, Postman or the Insomnia rest client to test the Dataprep API.
⚠️ When testing the API, bear in mind that you are working with your live production data, not sample data or test data.
Note that you will need to pass an API token with each request.
For e.g., here is how to run a job with curl:
curl -X POST 'https://api.clouddataprep.com/v4/jobGroups' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{ "wrangledDataset": { "id": "<recipe-id>" } }'
Using a graphical tool such as Postman or Insomnia, it is possible to import the API specifications directly:
Note that with Postman, you can also generate code snippets by selecting a request and clicking on the Code button.
ℹ️ NOTE: Each request to the Dataprep Platform must include authentication credentials.
API access tokens can be acquired and applied to your requests to obscure sensitive Personally Identifiable Information (PII) and are compliant with common privacy and security standards. These tokens last for a preconfigured time period and can be renewed as needed.
You can create and delete access tokens through the Settings area of the application. With each request, you submit the token as part of the Authorization header.
Authorization: Bearer <tokenValue>
As needed, you can create and use additional tokens. There is no limit to the number of tokens you can create. See Manage API Access Tokens for more information.
ℹ️ NOTE: You must be a project owner to create access tokens.
Security Scheme Type | HTTP |
---|---|
HTTP Authorization Scheme | bearer |
An object representing Dataprep's connection to an external data source. connections can be used for import, publishing, or both, depending on type.
Create a new connection
ref: createConnection
vendor required | string String identifying the connection`s vendor |
vendorName required | string Name of the vendor of the connection |
type required | string Enum: "jdbc" "rest" "remotefile" Type of connection |
credentialType required | string Enum: "basic" "securityToken" "iamRoleArn" "iamDbUser" "oauth2" "keySecret" "apiKey" "awsKeySecret" "basicWithAppToken" "userWithApiToken" "basicApp" "transactionKey" "password" "apiKeyWithToken" "noAuth" "httpHeaderBasedAuth" "privateApp" "httpQueryBasedAuth"
|