Uploading document images | API

The path /rest/uploads includes endpoints that allow to import a document into Transkribus.

https://transkribus.eu/TrpServerTesting/rest/uploads?collId={collectionID}

POST request to this endpoint creates a new upload process on the server. It is mandatory to set the query parameter collId which must include the ID of a collection where the user has write access.

If the header specifies application/json then a JSON object of the following form is expected:

{
    "md": {
        "title": "Bentham Box 35",
        "author": "Jeremy Bentham",
        "genre": "Notes",
        "writer": "Secretary"
    },
    "pageList": {"pages": [
        {
            "fileName": "035_320_001.jpg",
            "pageXmlName": "035_320_001.xml",
            "pageNr": 1,
            "imgChecksum": "9d531932c8e24d5a5dc13c92063698c9",
            "pageXmlChecksum": "b644a9c34a65ee07c1c576194e720b4a"
        },
        {
            "fileName": "035_321_001.jpg",
            "pageXmlName": "035_321_001.xml",
            "pageNr": 2,
            "imgChecksum": "e3ae1a862b9cd53cc87c9325d2502547",
            "pageXmlChecksum": "8ba4758b8b8d5df562e25809692be340"
        }
    ]}
}

Besides some basic (optional) metadata, this object defines the structure of the document to upload including the filenames to expect.
A page object just has to have a fileName and a pageNr. All other fields are optional! The checksums must be computed with MD5, if used.
The response to this request will return an enriched object of the same type. It will include a unique upload ID (field uploadId) that is to be used for the following requests.

https://transkribus.eu/TrpServerTesting/rest/uploads/{uploadId}

This endpoint is used to PUT the files for each page to Transkribus. Note, that the path now includes the uploadId from the response of the initial request.
The Content-Type of each request has to be multipart/form-data and it must include the complete data for one page, i.e. if a pageXmlName was set in the given structure object, then the image as well as the XML have to be delivered. It depends on the used library whether the Content-Type has to be set explicitly. Please refer to the respective documentation on multipart requests.
The body part names to be used are img and xml respectively and both should be sent as application/octet-stream.
If checksums have been defined, then the server will check the files upon each request and respond with 200 only if the transmission was flawless.
GET request to this path can be used to check the status of the upload process intermediately.
Once all files have been delivered successfully, the server will automatically start the ingest process. After the last PUT request is accepted, the returned object will include a field jobId that can be used to monitor the ingest process via GET requests to 

https://transkribus.eu/TrpServerTesting/rest/jobs/{id}.

Related Articles

Can AI save bad scans?

Can AI save bad scans?

The starting point for any kind of document digitization, whether done by hand or through sophisticated text recognition algorithms, is a good-quality image. Take a look at the one below. It is a...

Mapping Medieval Vienna: The digital edition of historical land registers supported by Transkribus
Success StoryMedievalArchives+1Austria

Mapping Medieval Vienna: The digital edition of historical land registers supported by Transkribus

A central goal of the research project 'Mapping Medieval Vienna' is to make the Viennese land registers of the 15th century available to the public. This is because the land register entries contain...

Supporting Future Scholars: The Transkribus Scholarship Programme

Supporting Future Scholars: The Transkribus Scholarship Programme

Imagine you are a student who wants to dive into the personal story of one of the few famous child authors in history; or who wants to discover what made the authors of the Spanish Golden Age of...