Handling Time Periods in STAR
This page discusses issues associated with how dates and time periods are represented in STAR datasets and outlines various techniques developed during the project.
Background and methods
Within the STAR datasets archaeological entities are typically associated with a time span rather than an absolute date. These time spans are expressed in a variety of different textual forms e.g. centuries, AD/BC dates, named Roman Emperors / British Monarchs, 3 age system. In order to use the dates represented in any meaningful way, we had to first convert the data to a more regular form. We then needed a way to align these time periods to a controlled set of known periods (for STAR purposes assuming England as the place concept associated with the period).
| Period description |
| MLC2-C3 |
| AD 341-6 |
| Iron Age |
| First half 1st century? |
| Antonine |
| LC2/EC3 |
| MLA |
Table 1 – examples of time periods encountered
Records containing date information were (semi-automatically) processed to give 2 numeric values representing the approximate lower and upper bounds of the time span indicated by the data record. We could then use these values to compare them to other known time spans (and to each other) to determine containment, overlap etc. Firstly a controlled list of time periods was collated to ensure a consistent approach across databases. The English Heritage 'timelines’ thesaurus and the English Heritage periods list (formerly known as the RCHME periods list) were supplemented with dates deduced from record description fields, concept scope notes, and online historical resources. Where there was not adequate evidence to determine even an approximate start and end date for a record, both lower and upper bound values remained set to zero.
The resultant data was then further processed using a custom console based application STAR.TIMELINE to assign a known time period identifier to each record. This allows clustering and searching for records, and also facilitates matching between database records and the grey literature. A semantic closeness measure (from previous work) for time periods was reused in the application to compare the start/end dates produced against a controlled list of known periods. Some periods overlap or are contained within others, so the matching method needs to accommodate these issues to suggest the most appropriate match. The STAR.TIMELINE application is made freely available for download and experimentation.
| P1 | P2 | Description |
| 0-150 | 0-150 | P1 equals P2 |
| 0-150 | 200-300 | P1 before P2 |
| 0-150 | 150-250 | P1 meets P2 |
| 0-150 | 100-200 | P1 overlaps P2 |
| 0-150 | 50-150 | P1 contains P2 |
Table 2 – example relationships between periods P1 and P2
A matching function was run against data extracted from a number of tables in the archaeological datasets. The approach allowed for multiple runs using alternative different time period lists. The first run was against the English Heritage periods list (formerly known as RCHME periods list), which is a fairly coarse grained list encompassing a breakdown of the three age system. Some sample results from this run are shown in Table 3. As this data originated from the Raunds Roman (RRAD) database, perhaps predictably most of the records got tagged as “ROMAN”...
| Data record – dates deduced from label | Closest controlled match based on dates | |||||||
| ID | Label | From | To | ID | Label | From | To | Relationship |
| 1315 | AD 228-31 | 228 | 231 | 10 | ROMAN | 43 | 410 | During |
| 1316 | AD 364-78 | 364 | 378 | 10 | ROMAN | 43 | 410 | During |
| 1317 | AD 69-79 | 69 | 79 | 10 | ROMAN | 43 | 410 | During |
| 1318 | AD 270-4 | 270 | 274 | 10 | ROMAN | 43 | 410 | During |
| 1319 | AD 275-402 | 275 | 402 | 10 | ROMAN | 43 | 410 | During |
| 1320 | AD 341-6 | 341 | 346 | 10 | ROMAN | 43 | 410 | During |
| 1321 | AD 268-70 | 268 | 270 | 10 | ROMAN | 43 | 410 | During |
| 1322 | AD 367-75 | 367 | 375 | 10 | ROMAN | 43 | 410 | During |
| 1324 | AD 270-84 | 270 | 284 | 10 | ROMAN | 43 | 410 | During |
| 1325 | AD 270-84 | 270 | 284 | 10 | ROMAN | 43 | 410 | During |
| 1326 | AD 367-75 | 367 | 375 | 10 | ROMAN | 43 | 410 | During |
| 1327 | AD 383-8 | 383 | 388 | 10 | ROMAN | 43 | 410 | During |
| 1328 | AD 330-40 | 330 | 340 | 10 | ROMAN | 43 | 410 | During |
| 1337 | Post-medieval | 1540 | 1901 | 16 | POST MEDIEVAL | 1540 | 1901 | Equals |
| 1370 | Medieval | 1066 | 1540 | 28 | MEDIEVAL | 1066 | 1540 | Equals |
| 1371 | AD 1943 | 1943 | 1943 | 109 | SECOND WORLD WAR | 1939 | 1945 | During |
Table 3 – sample of data from RRAD object_period table processed using English Heritage periods list
The second run was against the English Heritage “Timelines” thesaurus data, which we had manually supplemented with start/end dates (based on scope notes and other information) to produce a much more fine-grained controlled list of known periods. The sample results for the same records re-run against this list are shown in Table 4. The Timelines thesaurus included the terms from the periods list, but also more detailed periods. Note how where appropriate a more detailed period has been automatically selected.
| Data record – dates deduced from label | Closest controlled match based on dates | |||||||
| ID | Label | From | To | ID | Label | From | To | Relationship |
| 1315 | AD 228-31 | 228 | 231 | 136122 | ALEXANDER SEVERUS | 222 | 235 | During |
| 1316 | AD 364-78 | 364 | 378 | 900014 | 3RD QUARTER 4TH CENTURY AD | 351 | 375 | OverlappedBy |
| 1317 | AD 69-79 | 69 | 79 | 136087 | VESPASIAN | 69 | 79 | Equals |
| 1318 | AD 270-4 | 270 | 274 | 136164 | TETRICUS I | 270 | 274 | Equals |
| 1319 | AD 275-402 | 275 | 402 | 134825 | 4TH CENTURY AD | 300 | 399 | Includes |
| 1320 | AD 341-6 | 341 | 346 | 900013 | 2ND QUARTER 4TH CENTURY AD | 326 | 350 | During |
| 1321 | AD 268-70 | 268 | 270 | 136154 | CLAUDIUS II GOTHICUS | 268 | 270 | Equals |
| 1322 | AD 367-75 | 367 | 375 | 900014 | 3RD QUARTER 4TH CENTURY AD | 351 | 375 | Finishes |
| 1324 | AD 270-84 | 270 | 284 | 135952 | LATE 3RD CENTURY | 266 | 299 | During |
| 1325 | AD 270-84 | 270 | 284 | 135952 | LATE 3RD CENTURY | 266 | 299 | During |
| 1326 | AD 367-75 | 367 | 375 | 900014 | 3RD QUARTER 4TH CENTURY AD | 351 | 375 | Finishes |
| 1327 | AD 383-8 | 383 | 388 | 900015 | 4TH QUARTER 4TH CENTURY AD | 376 | 399 | During |
| 1328 | AD 330-40 | 330 | 340 | 900013 | 2ND QUARTER 4TH CENTURY AD | 326 | 350 | During |
| 1337 | Post-medieval | 1540 | 1901 | 134746 | POST MEDIEVAL | 1540 | 1901 | Equals |
| 1370 | Medieval | 1066 | 1540 | 134745 | MEDIEVAL | 1066 | 1540 | During |
| 1371 | AD 1943 | 1943 | 1943 | 134848 | SECOND WORLD WAR | 1939 | 1945 | During |
Table 4 – same sample data records processed using EH Timelines thesaurus
As a result of this process we created records for each database record containing dates held in a suitable form that they could be effectively cross searched, either directly by absolute date, or by thesaurus term. The processed data will next need to be extracted to RDF, conforming to the CRM model for representing time period information. This can be achieved using the existing STAR data extraction tool.
Data formats
In the interests of simplicity STAR.TIMELINE imports and exports all data in CSV format. The fields for the named periods file are:
- periodID – identifier for the named period
- periodLabel – text label for the named period
- periodMinYear – numeric minimum year for the named period
- periodMaxYear – numeric maximum year for the named period
e.g. 5, PALAEOLITHIC, -500000, -10000 54, LOWER PALAEOLITHIC, -500000, -150000 55, MIDDLE PALAEOLITHIC, -150000, -40000 56, UPPER PALAEOLITHIC, -40000, -10000 6, MESOLITHIC, -10000, -4000 etc.
The fields for the record data file are:
- recordID – identifier for the data record
- recordLabel – text label for the data record
- recordMinYear – numeric minimum year for the data record
- recordMaxYear – numeric minimum year for the data record
e.g. 19,300-400,300,400 20,31 BC-138 AD,-31,138 21,31 BC-AD 14,-31,14 22,3rd/4th century,200,400 23,98-117,98,117 24,AD 375-8,375,378 etc.
The fields for the processed output file are:
- recordID
- recordLabel
- recordMinYear
- recordMaxYear
- periodID
- periodLabel
- periodMinYear
- periodMaxYear
- periodRelation – type of relationship between record and period
e.g. 3,?1st century,1,100,26, PREHISTORIC OR ROMAN,-500000,410,During 4,?Modern,1901,2030,24, 20TH CENTURY,1901,2000,StartedBy 5,?Post medieval,1540,1901,16, POST MEDIEVAL,1540,1901,Equals 10,1st-2nd century AD,1,200,10, ROMAN,43,410,Overlaps 11,20-15 BC,-20,-15,67, LATE IRON AGE,-100,43,During 12,250-400,250,400,10, ROMAN,43,410,During etc.
For the purposes of experimentation there are some files already uploaded with the application:
- rchme.csv is a named periods file representing the English Heritage periods list.
- ehtimelines.csv is a named periods file representing periods from the English Heritage Timelines thesaurus.
- sampledata.csv is a file of sample records to be processed.
STAR.TIMELINE application
STAR.TIMELINE is an independent component of a larger project, written as an internal application for our own purposes. We will make the application source code freely available on request. It is a console-based application written in C# so requires the .NET framework (v2) installed as a prerequisite. It also uses the FileHelpers component for file import/export operations. The application setup should automatically take care of any installation and configuration. The application has the following menu options:
- Clear named periods – clear the internal list of known periods
- Import named periods – populate the internal list of known periods from a specified CSV file
- Process data records – process the specified CSV file. The output will be the input filename plus “.output.csv”
- Get closest named periods – use the closeness algorithm to find the closest matching periods for the data record
To get started, try importing one of the supplied lists of known periods (rchme.csv or ehtimelines.csv) and interactively querying it using the last 2 menu options. You can also process the sample data records (sampledata.csv – output goes to sampledata.csv.output.csv). Note – the named period lists included here are for experimentation, the most up to date source of data regarding the English Heritage periods list remains the FISH website and for the “Timelines” thesaurus (not formally published) it is English Heritage Thesauri
To download: STAR.TIMELINE application setup
STAR Timeline Service
As an extension of this work, an experimental URI based web service and test client application [5] were also produced, to perform the 'Get closest named periods’ functionality against the timelines thesaurus as a web service call. Calls to the service take the simple form <prefix>/getRelatedPeriods?startYear=0&endYear=100. The returned data is in JSON format. The test client displays the resultant service call, and the returned data represented as both a list and a graphical timeline.
STAR Timeline Service test client
Publications
Tudhope D., Taylor C. 1997. Navigation via Similarity: automatic linking based on semantic closeness. Information Processing and Management, 33(2), 233-242. Elsevier Science. doi:10.1016/S0306-4573(96)00067-2
Binding C. 2010. Implementing archaeological time periods using CIDOC CRM and SKOS. Proceedings 7th Extended Semantic Web Conference, Heraklion, L. Aroyo et al. (Eds.): ESWC 2010, Part I, Lecture Notes in Computer Science, 6088, 273–287, Springer-Verlag Berlin Heidelberg. final preprint, presentation
Binding C. 2010. Archaeology and Terminology. EuroVoc 2010 Conference: Mind the lexical gap — EuroVoc, building block of the semantic web, EU Publications Office, Luxembourg. Presentation: PDF-1342KB PowerPoint-4597KB (with screen capture video)
