Dataprep Standard API (v10.1.0)

Download OpenAPI specification:Download

API support: support@trifacta.com URL: https://trifacta.com License: Dataprep license Terms of Service

Overview

To enable programmatic control over its objects, the Dataprep Platform supports a range of REST API endpoints across its objects. This section provides an overview of the API design, methods, and supported use cases.

Most of the endpoints accept JSON as input and return JSON responses. This means that you must usually add the following headers to your request:

Content-type: application/json
Accept: application/json

ℹ️ NOTE: Access to APIs must be enabled on a per-project basis. For more information, see enable Api

Version: 10.1.0+2986503.20230707151310.f990a1da

Resources

The term resource refers to a single type of object in the Dataprep Platform metadata. An API is broken up by its endpoint's corresponding resource. The name of a resource is typically plural, and expressed in camelCase. Example: jobGroups.

Resource names are used as part of endpoint URLs, as well as in API parameters and responses.

CRUD Operations

The platform supports Create, Read, Update, and Delete operations on most resources. You can review the standards for these operations and their standard parameters below.

Some endpoints have special behavior as exceptions.

Create

To create a resource, you typically submit an HTTP POST request with the resource's required metadata in the request body. The response returns a 201 Created response code upon success with the resource's metadata, including its internal id, in the response body.

Read

An HTTP GET request can be used to read a resource or to list a number of resources.

A resource's id can be submitted in the request parameters to read a specific resource. The response usually returns a 200 OK response code upon success, with the resource's metadata in the response body.

If a GET request does not include a specific resource id, it is treated as a list request. The response usually returns a 200 OK response code upon success, with an object containing a list of resources' metadata in the response body.

When reading resources, some common query parameters are usually available. e.g.:

/v4/jobGroups?limit=100&includeDeleted=true&embed=jobs

Query Parameter	Type	Description
embed	string	Comma-separated list of objects to include part of the response. See Embedding resources.
includeDeleted	string	If set to `true`, response includes deleted objects.
limit	integer	Maximum number of objects to fetch. Usually 25 by default
offset	integer	Offset after which to start returning objects. For use with limit query parameter.

Update

Updating a resource requires the resource id, and is typically done using an HTTP PUT or PATCH request, with the fields to modify in the request body. The response usually returns a 200 OK response code upon success, with minimal information about the modified resource in the response body.

Delete

Deleting a resource requires the resource id and is typically executing via an HTTP DELETE request. The response usually returns a 204 No Content response code upon success.

Conventions

Resource names are plural and expressed in camelCase.
Resource names are consistent between main URL and URL parameter.
Parameter lists are consistently enveloped in the following manner:
```
{ "data": [{ ... }] }
```
Field names are in camelCase and are consistent with the resource name in the URL or with the embed URL parameter.
```
"creator": { "id": 1 },
"updater": { "id": 2 },
```

Embedding Resources

When reading a resource, the platform supports an embed query parameter for most resources, which allows the caller to ask for associated resources in the response. Use of this parameter requires knowledge of how different resources are related to each other and is suggested for advanced users only.

In the following example, the sub-jobs of a jobGroup are embedded in the response for jobGroup=1:

https://api.clouddataprep.com/v4/jobGroups/1?embed=jobs

If you provide an invalid embedding, you will get an error message. The response will contain the list of possible resources that can be embedded. e.g.

https://api.clouddataprep.com/v4/jobGroups/1?embed=*

Example error:

{
  "exception": {
    "name": "ValidationFailed",
    "message": "Input validation failed",
    "details": "No association * in flows! Valid associations are creator, updater, snapshots..."
  }
}

Fields

It is possible to let the application know that you need fewer data to improve the performance of the endpoints using the fields query parameter. e.g.

https://api.clouddataprep.com/v4/flows?fields=id;name

The list of fields need to be separated by semi-colons ;. Note that the application might sometimes return more fields than requested.

You can also use it while embedding resources.

https://api.clouddataprep.com/v4/flows?fields=id;name&embed=flownodes(fields=id)

Limit and sorting

You can limit and sort the number of embedded resources for some associations. e.g.

https://api.clouddataprep.com/v4/flows?fields=id&embed=flownodes(limit=1,fields=id,sort=-id)

Note that not all association support this. An error is returned when it is not possible to limit the number of embedded results.

Errors

The Dataprep Platform uses HTTP response codes to indicate the success or failure of an API request.

Codes in the 2xx range indicate success.
Codes in the 4xx range indicate that the information provided is invalid (invalid parameters, missing permissions, etc.)
Codes in the 5xx range indicate an error on the servers. These are rare and should usually go away when retrying. If you experience a lot of 5xx errors, contact support.

HTTP Status Code (client errors)	Notes
400 Bad Request	Potential reasons: Resource doesn't exist Request is incorrectly formatted Request contains invalid values
403 Forbidden	Incorrect permissions to access the Resource.
404 Not Found	Resource cannot be found.
410 Gone	Resource has been previously deleted.
415 Unsupported Media Type	Incorrect `Accept` or `Content-type` header

Request Ids

Each request has a request identifier, which can be found in the response headers, in the following form:

x-trifacta-request-id: <myRequestId>

ℹ️ NOTE: If you have an issue with a specific request, please include the x-trifacta-request-id value when you contact support

Versioning and Endpoint Lifecycle

API versioning is not synchronized to specific releases of the platform.
APIs are designed to be backward compatible.
Any changes to the API will first go through a deprecation phase.

Rate limiting

The Dataprep Platform applies a per-minute limit to the number of request received by the API for some endpoints. Users who send too many requests receive a HTTP status code 429 error response. For applicable endpoints, the quota is documented under the endpoint description.

Treat these limits as maximums and don't try to generate unnecessary load. Notes:

Limits may be changed or reduced at any time to prevent abuse.
Some endpoints may queue requests if the rate-limit is reached.
If you have special rate requirements, please contact Support.

Handling rate limiting

In case you need to trigger many requests on short interval, you can watch for the 429 status code and build a retry mechanism. The retry mechanism should follow an exponential backoff schedule to reduce request volume. Adding some randomness to the backoff schedule is recommended.

Response headers

For endpoints which are subject to low rate-limits, response headers will be included in the request and indicate how many requests are left for the current interval. You can use these to avoid blindly retrying.

Example response headers for an endpoint limited to 30 requests/user/min and 60 requests/workspace/min

Header name	Description
`x-rate-limit-user-limit`	The maximum number of requests you're permitted to make per user per minute (e.g. `30`)
`x-rate-limit-user-remaining`	The number of requests remaining in the current rate limit window. (e.g. `28`)
`x-rate-limit-user-reset`	The time at which the current rate limit window resets in UTC epoch seconds (e.g. `1631095033096`)
`x-rate-limit-workspace-limit`	The maximum number of requests you're permitted to make per workspace per minute (e.g. `60`)
`x-rate-limit-workspace-remaining`	The number of requests remaining in the current rate limit window. (e.g. `38`)
`x-rate-limit-workspace-reset`	The time at which the current rate limit window resets in UTC epoch milliseconds (e.g. `1631095033096`)
`x-retry-after`	Number of seconds until the current rate limit window resets (e.g. `42`)

Example error

If you exceed the rate limit, an error response is returned:

curl -i -X POST 'https://api.clouddataprep.com/v4/jobGroups' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{ "wrangledDataset": { "id": "<recipe-id>" } }'

HTTP/1.1 429 Too Many Requests
x-rate-limit-user-limit: 30
x-rate-limit-user-remaining: 0
x-rate-limit-user-reset: 1631096271696
x-retry-after: 57

{
  "exception": {
    "name": "TooManyRequestsException",
    "message": "Too Many Requests",
    "details": "API quota reached for \"runJobGroup\". Wait 57 seconds before making a new request. (Max. 30 requests allowed per minute per user.)"
  }
}

Trying the API

You can use a third party client, such as curl, HTTPie, Postman or the Insomnia rest client to test the Dataprep API.

⚠️ When testing the API, bear in mind that you are working with your live production data, not sample data or test data.

Note that you will need to pass an API token with each request.

For e.g., here is how to run a job with curl:

curl -X POST 'https://api.clouddataprep.com/v4/jobGroups' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <token>' \
-d '{ "wrangledDataset": { "id": "<recipe-id>" } }'

Using a graphical tool such as Postman or Insomnia, it is possible to import the API specifications directly:

Download the API specification by clicking the Download button at top of this document
Import the JSON specification in the graphical tool of your choice.

In Postman, you can click the import button at the top
With Insomnia, you can just drag-and-drop the file on the UI

Note that with Postman, you can also generate code snippets by selecting a request and clicking on the Code button.

Authentication

BearerAuth

ℹ️ NOTE: Each request to the Dataprep Platform must include authentication credentials.

API access tokens can be acquired and applied to your requests to obscure sensitive Personally Identifiable Information (PII) and are compliant with common privacy and security standards. These tokens last for a preconfigured time period and can be renewed as needed.

You can create and delete access tokens through the Settings area of the application. With each request, you submit the token as part of the Authorization header.

Authorization: Bearer <tokenValue>

As needed, you can create and use additional tokens. There is no limit to the number of tokens you can create. See Manage API Access Tokens for more information.

ℹ️ NOTE: You must be a project owner to create access tokens.

Security Scheme Type	HTTP
HTTP Authorization Scheme	bearer

ImportedDataset

An object representing data loaded into Dataprep, as well as any structuring that has been applied to it. imported datasets are the starting point for wrangling, and can be used in multiple flows.

List inputs for Output Object

List all the inputs that are linked to this output object. Also include data sources that are present in referenced flows.

ref: getInputsForOutputObject

Authorizations:

BearerAuth

path Parameters

required

integer

Responses

Response samples

Content type

application/json

{"data": [{"dynamicPath": "string",
"isDynamic": false,
"isConverted": true,
"disableTypeInference": true,
"parsingScript": {"id": 1
},
"storageLocation": {"id": 1
},
"connection": {"id": 1
},
"runParameters": {"data": [{"type": "path",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"dateRange": {"timezone": "string",
"formats": ["string"
],
"last": {"unit": "years",
"number": 1,
"dow": 1
}
}
}
}
]
},
"id": 1,
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"updater": {"id": 1
},
"workspace": {"id": 1
},
"name": "My Dataset",
"description": "string"
}
],
"count": 1
}

Job

An internal object encoding the information necessary to run a part of a Dataprep jobGroup.

This is called a "Stage" on the Job Results page in the UI.

Get Jobs for Job Group

Get information about the batch jobs within a Dataprep job.

ref: getJobsForJobGroup

Authorizations:

BearerAuth

path Parameters

required

integer

Responses

Response samples

Content type

application/json

{"data": [{"id": 1,
"status": "Complete",
"jobType": "wrangle",
"sampleSize": 1,
"percentComplete": 1,
"jobGroup": {"id": 1
},
"errorMessage": {"id": 1
},
"lastHeartbeatAt": "2019-08-24T14:15:22Z",
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"executionLanguage": "photon",
"cpJobId": "string",
"wranglescript": {"id": 1
},
"emrcluster": {"id": 1
}
}
],
"count": 1
}

JobGroup

A collection of internal jobs, representing a single execution from the user, or the generation of a single Sample.

The terminology might be slightly confusing but remains for backward compatibility reasons.

A jobGroup is generally called a "Job" in the UI.
A job is called a "Stage" in the UI.

Run Job Group

Create a jobGroup, which launches the specified job as the authenticated user. This performs the same action as clicking on the Run Job button in the application.

The request specification depends on one of the following conditions:

The recipe (wrangledDataset) already has an output object and just needs to be run.
The recipe has already had a job run against it and just needs to be re-run.
The recipe has not had a job run, or the job definition needs to be re-specified.

In the last case, you must specify some overrides when running the job. See the example with overrides for more information.

ℹ️ NOTE: Override values applied to a job are not validated. Invalid overrides may cause your job to fail.

Request Body - Run job

To run a job, you just specify the recipe identifier (wrangledDataset.id). If the job is successful, all defined outputs are generated, as defined in the outputobject, publications, and writeSettings objects associated with the recipe.

✅ TIP: To identify the wrangledDataset Id, select the recipe icon in the flow view and take the id shown in the URL. e.g. if the URL is /flows/10?recipe=7, the wrangledDataset Id is 7.

{"wrangledDataset": {"id": 7}}

Overriding the output settings

If you must change some outputs or other settings for the specific job, you can insert these changes in the overrides section of the request. In the example below, the running environment, profiling option, and writeSettings for the job are modified for this execution.

{
  "wrangledDataset": {"id": 1},
  "overrides": {
    "execution": "spark",
    "profiler": false,
    "writesettings": [
      {
        "path": "<path_to_output_file>",
        "action": "create",
        "format": "csv",
        "compression": "none",
        "header": false,
        "asSingleFile": false
      }
    ]
  }
}

Using Variables (Run Parameters)

If you have created a dataset with parameters, you can specify overrides for parameter values during execution through the APIs. Through this method, you can iterate job executions across all matching sources of a parameterized dataset. In the example below, the runParameters override has been specified for the country. In this case, the value "Germany" is inserted for the specified variable as part of the job execution.

{
  "wrangledDataset": {"id": 33},
  "runParameters": {
    "overrides": {
      "data": [{"key": "country", "value": "Germany"}]
    }
  }
}

Response

The response contains a list of jobs which can be used to get a granular status of the JobGroup completion. The jobGraph indicates the dependency between each of the jobs.

{
  "sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
  "reason": "JobStarted",
  "jobGraph": {
    "vertices": [21, 22],
    "edges": [{"source": 21, "target": 22}]
  },
  "id": 9,
  "jobs": {"data": [{"id": 21}, {"id": 22}]}
}

Acquire internal jobGroup identifier

When you create a new jobGroup through the APIs, the internal jobGroup identifier is returned in the response. Retain this identifier for future use. You can also acquire the jobGroup identifier from the application. In the Jobs page, the internal identifier for the jobGroup is the value in the left column.

Quotas:
30 req./user/min, 60 req./workspace/min

ref: runJobGroup

Authorizations:

BearerAuth

header Parameters

x-execution-id

string

Example: f9cab740-50b7-11e9-ba15-93c82271a00b

Optional header to safely retry the request without accidentally performing the same operation twice. If a JobGroup with the same executionId already exist, the request will return a 304.

Request Body schema: application/json

required	object The identifier for the recipe you would like to run.
forceCacheUpdate	boolean Setting this flag to true will invalidate any cached datasources. This only applies to SQL datasets.
ignoreRecipeErrors	boolean Default: false Setting this flag to true will mean the job will run even if there are upstream recipe errors. Setting it to false will cause the Request to fail on recipe errors.
testMode	boolean Setting this flag to true will not run the job but just perform some validations.
	object (runParameterOverrides) Allows to override parameters that are defined in the flow on datasets or outputs for e.g.
workspaceId	integer Internal. Does not need to be specified
	object Allows to override execution settings that are set on the output object.
ranfrom	string Enum: "ui" "schedule" "api" Where the job was executed from. Does not need to be specified when using the API. `ui` - Dataprep application `schedule` - Scheduled `api` - the API (using an API token)

Responses

Request samples

Content type

application/json

Example

{"wrangledDataset": {"id": 7
}
}

Response samples

Content type

application/json

{"sessionId": "79276c31-c58c-4e79-ae5e-fed1a25ebca1",
"reason": "JobStarted",
"jobGraph": {"vertices": [21,
22
],
"edges": [{"source": 21,
"target": 22
}
]
},
"id": 9,
"jobs": {"data": [{"id": 21
},
{"id": 22
}
]
}
}

Get job group

Get the specified jobGroup.

A job group is a job that is executed from a specific node in a flow. The job group may contain:

Wrangling job on the dataset associated with the node
Jobs on all datasets on which the selected job may depend
A profiling job for the job group

JobGroup Status

It is possible to only get the current status for a jobGroup:

/v4/jobGroups/{id}/status

In that case, the response status would simply be a string:

"Complete"

Embedding resources

If you wish to also get the related jobs and wrangledDataset, you can use embed. See embedding resources for more information.

/v4/jobGroups/{id}?embed=jobs,wrangledDataset

ref: getJobGroup

Authorizations:

BearerAuth

path Parameters

required

integer

query Parameters

fields	string Example: fields=id;name;description Semi-colons-separated list of fields
embed	string Example: embed=association.otherAssociation,anotherAssociation Comma-separated list of objects to pull in as part of the response. See Embedding Resources for more information.
	string or Array of strings Whether to include all or some of the nested deleted objects.

Responses

Response samples

Content type

application/json

{"name": "string",
"description": "string",
"ranfrom": "ui",
"ranfor": "recipe",
"status": "Complete",
"profilingEnabled": true,
"runParameterReferenceDate": "2019-08-24T14:15:22Z",
"snapshot": {"id": 1
},
"wrangledDataset": {"id": 1
},
"flowrun": {"id": 1
},
"id": 1,
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"updater": {"id": 1
}
}

Get Jobs for Job Group

Get information about the batch jobs within a Dataprep job.

ref: getJobsForJobGroup

Authorizations:

BearerAuth

path Parameters

required

integer

Responses

Response samples

Content type

application/json

{"data": [{"id": 1,
"status": "Complete",
"jobType": "wrangle",
"sampleSize": 1,
"percentComplete": 1,
"jobGroup": {"id": 1
},
"errorMessage": {"id": 1
},
"lastHeartbeatAt": "2019-08-24T14:15:22Z",
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"executionLanguage": "photon",
"cpJobId": "string",
"wranglescript": {"id": 1
},
"emrcluster": {"id": 1
}
}
],
"count": 1
}

Misc

A collection of miscellaneous endpoints.

Get OpenAPI specification

Get Open Api specifications

ref: getOpenApiSpec

Authorizations:

BearerAuth

Responses

Response samples

Content type

application/json

{ }

OutputObject

An outputObject is a definition of one or more types of outputs and how they are generated.

It must be associated with a recipe (also called wrangledDataset).
An outputObject must be created for a recipe before you can run a job on it.
An outputObject stores references to writeSettings, publications and sqlScripts.

Create output object

If an outputObject already exists for the recipe (flowNodeId) to which you are posting, you must either modify the object instead or delete it before posting your new object.

ref: createOutputObject

Authorizations:

BearerAuth

Request Body schema: application/json

execution required	string Enum: "photon" "dataflow" Execution language. Indicate on which engine the job was executed. Can be null/missing for scheduled jobs that fail during the validation phase. `photon` - Photon engine. High performance embedded engine designed for small datasets (up to 1GB). Not available for all product editions. `dataflow` - Google Dataflow engine. Fully managed service for executing Apache Beam pipelines.
profiler required	boolean Indicate if recipe errors should be ignored for the jobGroup.
isAdhoc	boolean Default: true Indicate if the outputObject correspond to manual (Adhoc) or scheduled run.
ignoreRecipeErrors	boolean Indicate if profiling information should be produced as part of the jobGroup. This will create a Profile job.
flowNodeId	integer FlowNode the outputObject should be attached to. (This is also the id of the wrangledDataset).
	Array of objects (writeSettingCreateRequest) [ items ] Optionally you can include writeSettings while creating the outputObject
	Array of objects (sqlScriptCreateRequest) [ items ] Optionally you can include sqlScripts while creating the outputObject
	Array of objects (publicationCreateRequest) [ items ] Optionally you can include publications while creating the outputObject
	object (outputObjectSchemaDriftOptionsUpdateRequest)

Responses

Request samples

Content type

application/json

{"execution": "photon",
"profiler": true,
"isAdhoc": true,
"ignoreRecipeErrors": true,
"flowNodeId": 1,
"writeSettings": [{"path": "string",
"action": "create",
"format": "csv",
"compression": "none",
"header": true,
"asSingleFile": true,
"delim": ",",
"hasQuotes": true,
"includeMismatches": true,
"outputObjectId": 1,
"runParameters": [{"type": "path",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"variable": {"value": "string"
}
}
}
],
"connectionId": "25"
}
],
"sqlScripts": [{"sqlScript": "string",
"type": "string",
"vendor": "string",
"outputObjectId": "21",
"connectionId": "21",
"runParameters": [{"type": "sql",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"variable": {"value": "string"
}
}
}
]
}
],
"publications": [{"path": ["string"
],
"tableName": "string",
"targetType": "string",
"action": "create",
"outputObjectId": 1,
"connectionId": "21",
"runParameters": [{"type": "path",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"variable": {"value": "string"
}
}
}
],
"parameters": {"property1": {"type": "string",
"default": null
},
"property2": {"type": "string",
"default": null
}
}
}
],
"outputObjectSchemaDriftOptions": {"schemaValidation": "true",
"stopJobOnErrorsFound": "false"
}
}

Response samples

Content type

application/json

{"execution": "photon",
"profiler": true,
"isAdhoc": true,
"flownode": {"id": 1
},
"id": 1,
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"updater": {"id": 1
},
"name": "string",
"description": "string"
}

List output objects

List existing output objects

ref: listOutputObjects

Authorizations:

BearerAuth

query Parameters

fields	string Example: fields=id;name;description Semi-colons-separated list of fields
embed	string Example: embed=association.otherAssociation,anotherAssociation Comma-separated list of objects to pull in as part of the response. See Embedding Resources for more information.
	string or Array of strings Whether to include all or some of the nested deleted objects.
limit	integer Default: 25 Maximum number of objects to fetch.
offset	integer Offset after which to start returning objects. For use with `limit`.
filterType	string Default: "fuzzy" Defined the filter type, one of ["fuzzy", "contains", "exact", "exactIgnoreCase"]. For use with `filter`.
sort	string Example: sort=-createdAt Defines sort order for returned objects
filterFields	string Default: "name" Example: filterFields=id,order comma-separated list of fields to match the `filter` parameter against.
filter	string Example: filter=my-object Value for filtering objects. See `filterFields`.
includeCount	boolean If includeCount is true, it will include the total number of objects as a count object in the response

Responses

Response samples

Content type

application/json

{"data": [{"execution": "photon",
"profiler": true,
"isAdhoc": true,
"flownode": {"id": 1
},
"id": 1,
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"updater": {"id": 1
},
"name": "string",
"description": "string"
}
],
"count": 1
}

Count output objects

Count existing output objects

ref: countOutputObjects

Authorizations:

BearerAuth

query Parameters

fields	string Example: fields=id;name;description Semi-colons-separated list of fields
embed	string Example: embed=association.otherAssociation,anotherAssociation Comma-separated list of objects to pull in as part of the response. See Embedding Resources for more information.
	string or Array of strings Whether to include all or some of the nested deleted objects.
limit	integer Default: 25 Maximum number of objects to fetch.
offset	integer Offset after which to start returning objects. For use with `limit`.
filterType	string Default: "fuzzy" Defined the filter type, one of ["fuzzy", "contains", "exact", "exactIgnoreCase"]. For use with `filter`.
sort	string Example: sort=-createdAt Defines sort order for returned objects
filterFields	string Default: "name" Example: filterFields=id,order comma-separated list of fields to match the `filter` parameter against.
filter	string Example: filter=my-object Value for filtering objects. See `filterFields`.
includeCount	boolean If includeCount is true, it will include the total number of objects as a count object in the response

Responses

Response samples

Content type

application/json

{"count": 1
}

Get output object

Get the specified outputObject.

Note that it is possible to include writeSettings and publications that are linked to this outputObject. See embedding resources for more information.

/v4/outputObjects/{id}?embed=writeSettings,publications

You can also access outputobjectdataflowoptions

/v4/outputObjects/{id}?embed=outputobjectdataflowoptions

ref: getOutputObject

Authorizations:

BearerAuth

path Parameters

required

integer

query Parameters

fields	string Example: fields=id;name;description Semi-colons-separated list of fields
embed	string Example: embed=association.otherAssociation,anotherAssociation Comma-separated list of objects to pull in as part of the response. See Embedding Resources for more information.
	string or Array of strings Whether to include all or some of the nested deleted objects.

Responses

Response samples

Content type

application/json

{"execution": "photon",
"profiler": true,
"isAdhoc": true,
"flownode": {"id": 1
},
"id": 1,
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"updater": {"id": 1
},
"name": "string",
"description": "string"
}

Patch output object

Patch an existing output object

ref: patchOutputObject

Authorizations:

BearerAuth

path Parameters

required

integer

Request Body schema: application/json

execution	string Enum: "photon" "dataflow" Execution language. Indicate on which engine the job was executed. Can be null/missing for scheduled jobs that fail during the validation phase. `photon` - Photon engine. High performance embedded engine designed for small datasets (up to 1GB). Not available for all product editions. `dataflow` - Google Dataflow engine. Fully managed service for executing Apache Beam pipelines.
profiler	boolean Indicate if recipe errors should be ignored for the jobGroup.
ignoreRecipeErrors	boolean Indicate if profiling information should be produced as part of the jobGroup. This will create a Profile job.
	Array of objects (writeSettingCreateRequest) [ items ]
	Array of objects (sqlScriptCreateRequest) [ items ]
	Array of objects (publicationCreateRequest) [ items ]
	object (outputObjectDataflowOptionsUpdateRequest) ℹ️ NOTE: Issues that are shown as warnings in the UI will not be present when using the API. ℹ️ NOTE: If VPC network mode is set to `AUTO`, do not include entries in the request for `network`, `subnetwork` or for `usePublicIps`. ℹ️ NOTE: If autoscaling algorithm is set to `NONE`, do not include entry in the request for `maxNumWorkers`.
	object (outputObjectSchemaDriftOptionsUpdateRequest)
name	string Name of output as it appears in the flow view
description	string Description of output

Responses

Request samples

Content type

application/json

{"execution": "photon",
"profiler": true,
"ignoreRecipeErrors": true,
"writeSettings": [{"path": "string",
"action": "create",
"format": "csv",
"compression": "none",
"header": true,
"asSingleFile": true,
"delim": ",",
"hasQuotes": true,
"includeMismatches": true,
"outputObjectId": 1,
"runParameters": [{"type": "path",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"variable": {"value": "string"
}
}
}
],
"connectionId": "25"
}
],
"sqlScripts": [{"sqlScript": "string",
"type": "string",
"vendor": "string",
"outputObjectId": "21",
"connectionId": "21",
"runParameters": [{"type": "sql",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"variable": {"value": "string"
}
}
}
]
}
],
"publications": [{"path": ["string"
],
"tableName": "string",
"targetType": "string",
"action": "create",
"outputObjectId": 1,
"connectionId": "21",
"runParameters": [{"type": "path",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"variable": {"value": "string"
}
}
}
],
"parameters": {"property1": {"type": "string",
"default": null
},
"property2": {"type": "string",
"default": null
}
}
}
],
"outputObjectDataflowOptions": {"region": "us-central1",
"zone": "us-central1-a",
"machineType": "n1-standard-64",
"network": "my-network-name",
"subnetwork": "regions/us-central1/subnetworks/my-subnetwork",
"autoscalingAlgorithm": "THROUGHPUT_BASED",
"serviceAccount": "my-service-account-name@<project-id>.iam.gserviceaccount.com",
"numWorkers": "1",
"maxNumWorkers": "1000",
"usePublicIps": "true",
"labels": [{"key": "my-billing-label-key",
"value": "my-billing-label-value"
}
]
},
"outputObjectSchemaDriftOptions": {"schemaValidation": "true",
"stopJobOnErrorsFound": "false"
},
"name": "string",
"description": "string"
}

Response samples

Content type

application/json

{"id": 1,
"updater": {"id": 1
},
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z"
}

Delete output object

Delete an existing output object

ref: deleteOutputObject

Authorizations:

BearerAuth

path Parameters

required

integer

Responses

List inputs for Output Object

List all the inputs that are linked to this output object. Also include data sources that are present in referenced flows.

ref: getInputsForOutputObject

Authorizations:

BearerAuth

path Parameters

required

integer

Responses

Response samples

Content type

application/json

{"data": [{"dynamicPath": "string",
"isDynamic": false,
"isConverted": true,
"disableTypeInference": true,
"parsingScript": {"id": 1
},
"storageLocation": {"id": 1
},
"connection": {"id": 1
},
"runParameters": {"data": [{"type": "path",
"overrideKey": "myVar",
"insertionIndices": [{"index": 1,
"order": 1
}
],
"value": {"dateRange": {"timezone": "string",
"formats": ["string"
],
"last": {"unit": "years",
"number": 1,
"dow": 1
}
}
}
}
]
},
"id": 1,
"createdAt": "2019-08-24T14:15:22Z",
"updatedAt": "2019-08-24T14:15:22Z",
"creator": {"id": 1
},
"updater": {"id": 1
},
"workspace": {"id": 1
},
"name": "My Dataset",
"description": "string"
}
],
"count": 1
}

Person

A Dataprep user.

Set Dataflow option for self Deprecated

Please use setDataflowOptionForCurrentPerson instead.

ref: setDataflowOptionForPerson

Authorizations:

BearerAuth

path Parameters

type

required

string

region, zone, machineType, network, subnetwork, autoscalingAlgorithm, serviceAccount, numWorkers, maxNumWorkers, usePublicIps, labels.

Request Body schema: application/json

required

string or Array of objects

Responses

Request samples

Content type

application/json

Example

{"value": "us-central-1"
}

Response samples

Content type

application/json

Example

{"region": {"key": "region",
"value": "us-central-1",
"person": {"id": 1
},
"id": 1,
"createdAt": "2020-04-20T12:49:41Z",
"updatedAt": "2020-04-20T12:49:41Z"
}
}

Workspace

A self-contained, configurable space shared by several users, containing flows, Datasets, connections, and other Dataprep objects.

Transfer User Assets

Transfer Dataprep assets to another user in the current workspace. For the given workspace, assigns ownership of all the user's contents to another user. This includes flows, datasets, recipes, and connections–basically any object that can be created and managed through the Dataprep UI.

ref: transferUserAssetsInCurrentWorkspace

Authorizations:

BearerAuth

Request Body schema: application/json

One of

required	integer or string the id of the person to transfer assets from
required	integer or string the id of the person to transfer assets to
	object Asset IDs that need to be transferred. To specify all assets of a certain type, use "all" instead of integer array. If assets payload is not provided, all assets of all types will be transferred.

Responses

Request samples

Content type

application/json

{"fromPersonId": 2,
"toPersonId": 5,
"assets": {"connections": [702,
704
],
"datasources": [111,
112,
113
],
"flows": [201,
202
],
"macros": "all",
"userdefinedfunctions": [310,
307,
308
],
"plans": [510,
512
]
}
}