When a new file is uploaded, dotCMS automatically extracts and stores the file metadata in the content as additional fields. These fields can then be accessed from your Elasticsearch queries and Velocity code to either search on or display the file metadata.
The types of metadata that are stored by dotCMS is configurable and intentionally limited by default. For more information, please see the How Content is Mapped to ElasticSearch documentation concerning metadata.
Extracting File Metadata
To extract metadata from a new file:
1. Open the Site Browser.
2. Navigate to the folder where you want to upload the file.
3. Right-click the folder and select New → Image / File.
4. Choose the File Asset Type you want to upload and click Select.
5. Select a file from your computer and click Upload.
The file information will be displayed on the page and the actions available will be displayed on the right section under: Actions. These actions will vary based on the workflow scheme associated with your File Asset Type.
6. Once your file appears in the Site Browser, double-click the file or right-click the file and select Edit to edit it.
7. In the Add/Edit File screen, select the Metadata tab. All the metadata extracted from the file will be displayed.
The contents of the file are also extracted into a metadata field named
contents. This enables you to search inside the file contents of many of the most popular types of documents including PDF, text, Word, Excel, and more. This field is not displayed on the Metadata tab because it contains the entire document’s content; however it is available via Content tab searches and from Velocity code.
Since different document types contain different metadata fields, the metadata fields accessible within dotCMS will also be different for each document type. For example the following image displays the metadata for a JPG image, while the previous image displays the metadata for a PDF document. Image files have image-specific specific metadata fields such as height and width which do not exist (or make sense) for PDF documents.
Searching Metadata fields
Once you have uploaded a file, the metadata fields and values are stored in the file system (within
metadata.json) and also indexed using Elasticsearch, enabling you to perform normal ElasticSearch queries on the metadata. To perform a metadata search:
- Open the Content tab.
- Select File Asset in the Type field.
- This displays all the files you have created as content (uploaded to dotCMS).
- Click the Advanced link under the search button.
- Enter your metadata search information in the Metadata search field.
For example, to search for JPG images, enter
contentType:*image/jpeg* in the Metadata search field. To search for PDF documents enter
Search Document Contents
To search inside file contents, use the
content: keyword as if it was a metadata field name. The following example searches for all files that have the terms footer-nav inside the text of the file contents: