Skip to main content
GET
https://api.pretectum.io
/
v1
/
businessareas
/
{businessAreaId}
/
schemas
/
{schemaId}
/
datasets
List Datasets
curl --request GET \
  --url https://api.pretectum.io/v1/businessareas/{businessAreaId}/schemas/{schemaId}/datasets \
  --header 'Authorization: <authorization>'
{
  "items": [
    {
      "dataSetId": "<string>",
      "dataSetName": "<string>",
      "dataSetDescription": "<string>",
      "businessAreaId": "<string>",
      "businessAreaName": "<string>",
      "schemaId": "<string>",
      "schemaName": "<string>",
      "recordCount": 123,
      "erroredRecordsCount": 123,
      "runningJobsCount": 123,
      "version": 123,
      "createdBy": "<string>",
      "createdByEmail": "<string>",
      "createdByName": "<string>",
      "updatedBy": "<string>",
      "updatedByEmail": "<string>",
      "updatedByName": "<string>",
      "createdDate": "<string>",
      "updatedDate": "<string>",
      "deleted": true
    }
  ],
  "nextPageKey": "<string>"
}
The List Datasets endpoint returns all datasets defined within a specific schema. Datasets are collections of data objects that share the same schema structure and represent actual data records in your master data repository.

Prerequisites

Authentication

Include your access token in the Authorization header.
Pass the token directly without the “Bearer” prefix.
Authorization: your_access_token

Request

Path Parameters

businessAreaId
string
required
The unique identifier of the business area. You can obtain this from the List Business Areas endpoint.
schemaId
string
required
The unique identifier of the schema containing the datasets. You can obtain this from the List Schemas endpoint.

Query Parameters

pageKey
string
A pagination token for retrieving the next page of results. This value is returned in the response as nextPageKey when more results are available.

Headers

Authorization
string
required
Your access token obtained from the /v1/oauth2/token endpoint. Pass the token directly without the “Bearer” prefix.
Accept
string
default:"application/json"
The response content type. Currently only application/json is supported.

Example Requests

# List all datasets in a schema
curl -X GET "https://api.pretectum.io/v1/businessareas/20240115103000123a1b2c3d4e5f6789012345678901234/schemas/20240115103000456d1e2f3a4b5c6789012345678901234/datasets" \
  -H "Authorization: your_access_token" \
  -H "Accept: application/json"

# Paginate through datasets
curl -X GET "https://api.pretectum.io/v1/businessareas/20240115103000123a1b2c3d4e5f6789012345678901234/schemas/20240115103000456d1e2f3a4b5c6789012345678901234/datasets?pageKey=eyJMYXN0RXZhbHVhdGVkS2V5Ijp7Li4ufQ" \
  -H "Authorization: your_access_token" \
  -H "Accept: application/json"

Response

A successful request returns an object containing an array of datasets and pagination information.
items
array
required
An array of dataset objects within the schema.
nextPageKey
string
A pagination token for retrieving the next page of results. If this field is present, more datasets are available. Pass this value as the pageKey query parameter in your next request.

Example Response

{
  "items": [
    {
      "dataSetId": "20240925152201042a1b2c3d4e5f6789012345678901234",
      "dataSetName": "US Customers",
      "dataSetDescription": "Customer records for United States region",
      "businessAreaId": "20240115103000123a1b2c3d4e5f6789012345678901234",
      "businessAreaName": "Customer",
      "schemaId": "20240115103000456d1e2f3a4b5c6789012345678901234",
      "schemaName": "Individual Customer",
      "recordCount": 15420,
      "erroredRecordsCount": 12,
      "runningJobsCount": 0,
      "version": 5,
      "createdBy": "9ae5f422-bb62-4c9d-b277-594ddcda6d8d",
      "createdByEmail": "admin@example.com",
      "createdByName": "John Admin",
      "updatedBy": "9ae5f422-bb62-4c9d-b277-594ddcda6d8d",
      "updatedByEmail": "admin@example.com",
      "updatedByName": "John Admin",
      "createdDate": "2024-09-25T15:22:01.042Z",
      "updatedDate": "2024-12-15T10:30:00.000Z",
      "deleted": false
    },
    {
      "dataSetId": "20240926090000123b2c3d4e5f6a7890123456789012345",
      "dataSetName": "European Customers",
      "dataSetDescription": "Customer records for European region",
      "businessAreaId": "20240115103000123a1b2c3d4e5f6789012345678901234",
      "businessAreaName": "Customer",
      "schemaId": "20240115103000456d1e2f3a4b5c6789012345678901234",
      "schemaName": "Individual Customer",
      "recordCount": 8750,
      "erroredRecordsCount": 3,
      "runningJobsCount": 0,
      "version": 2,
      "createdBy": "b5f6g733-cc73-5d0e-c388-605eeda7e9e",
      "createdByEmail": "data_manager@example.com",
      "createdByName": "Jane Manager",
      "updatedBy": "b5f6g733-cc73-5d0e-c388-605eeda7e9e",
      "updatedByEmail": "data_manager@example.com",
      "updatedByName": "Jane Manager",
      "createdDate": "2024-09-26T09:00:00.123Z",
      "updatedDate": "2024-11-20T14:45:00.000Z",
      "deleted": false
    }
  ],
  "nextPageKey": "eyJMYXN0RXZhbHVhdGVkS2V5Ijp7ImRhdGFTZXRJZCI6IjIwMjQwOTI2MDkwMDAw..."
}

Response Without Pagination

When all datasets fit in a single response, no nextPageKey is returned:
{
  "items": [
    {
      "dataSetId": "20240925152201042a1b2c3d4e5f6789012345678901234",
      "dataSetName": "US Customers",
      "dataSetDescription": "Customer records for United States region",
      "businessAreaId": "20240115103000123a1b2c3d4e5f6789012345678901234",
      "businessAreaName": "Customer",
      "schemaId": "20240115103000456d1e2f3a4b5c6789012345678901234",
      "schemaName": "Individual Customer",
      "recordCount": 15420,
      "erroredRecordsCount": 0,
      "runningJobsCount": 0,
      "version": 5,
      "createdBy": "9ae5f422-bb62-4c9d-b277-594ddcda6d8d",
      "createdByEmail": "admin@example.com",
      "createdByName": "John Admin",
      "updatedBy": "9ae5f422-bb62-4c9d-b277-594ddcda6d8d",
      "updatedByEmail": "admin@example.com",
      "updatedByName": "John Admin",
      "createdDate": "2024-09-25T15:22:01.042Z",
      "updatedDate": "2024-12-15T10:30:00.000Z",
      "deleted": false
    }
  ]
}

Empty Response

If the schema has no datasets defined, the response will contain an empty items array:
{
  "items": []
}

Error Responses

Status CodeDescription
401 UnauthorizedInvalid or expired access token. Obtain a new token from /v1/oauth2/token and try again.
403 ForbiddenYour application client does not have permission to access datasets. Contact your tenant administrator.
404 Not FoundThe specified business area or schema does not exist, or you do not have access to it.
500 Internal Server ErrorAn unexpected error occurred on the server. Try again later or contact support.

Pagination

When a schema contains many datasets, results are paginated. Use the nextPageKey from the response to fetch subsequent pages:
async function getAllDatasets(businessAreaId, schemaId) {
  const allDatasets = [];
  let pageKey = null;

  do {
    const response = await getDatasets(businessAreaId, schemaId, pageKey);
    allDatasets.push(...response.items);
    pageKey = response.nextPageKey;
  } while (pageKey);

  return allDatasets;
}

const allDatasets = await getAllDatasets(businessAreaId, schemaId);
console.log(`Total datasets: ${allDatasets.length}`);

Use Cases

Filtering Search Results by Dataset

Use the dataset names returned by this endpoint to filter your data object searches:
# First, get the list of datasets
curl -X GET "https://api.pretectum.io/v1/businessareas/{businessAreaId}/schemas/{schemaId}/datasets" \
  -H "Authorization: your_access_token"

# Then search within a specific dataset
curl -X GET "https://api.pretectum.io/dataobjects/search?query=John&dataSet=US%20Customers" \
  -H "Authorization: your_access_token"

Monitoring Data Quality

Check the erroredRecordsCount to identify datasets with data quality issues:
const datasets = await getDatasets(businessAreaId, schemaId);

const datasetsWithErrors = datasets.items.filter(ds => ds.erroredRecordsCount > 0);
datasetsWithErrors.forEach(ds => {
  console.log(`${ds.dataSetName}: ${ds.erroredRecordsCount} errors out of ${ds.recordCount} records`);
});

Tracking Record Counts

Monitor the size of your datasets:
datasets = get_datasets(business_area_id, schema_id)

total_records = sum(ds['recordCount'] for ds in datasets['items'])
print(f"Total records across all datasets: {total_records}")

for ds in sorted(datasets['items'], key=lambda x: x['recordCount'], reverse=True):
    print(f"  {ds['dataSetName']}: {ds['recordCount']:,} records")

Best Practices

  1. Cache dataset lists: Dataset metadata changes less frequently than record data. Cache the response and refresh periodically.
  2. Filter by active datasets: Exclude datasets where deleted: true in user-facing interfaces.
  3. Use names for search filters: When filtering searches with the dataSet parameter, use the dataSetName field value.
  4. Handle pagination: Always check for nextPageKey in responses and fetch all pages if needed.
  5. Monitor error counts: Regularly check erroredRecordsCount to identify data quality issues early.