IIR Sample Data Files
Available Sample Files
| File | Entity Type | Records | Key Fields | Description |
|---|---|---|---|---|
| plants.json | PLANT- | 25 | plantName, industryCode, parentCompanyName, stateName, primaryFuel, qcDate, livingForward | Industrial facility records across all sectors except Data Centers. Includes operational status, fuel type, market region, NERC region, and ISO/RTO for power assets. |
| datacenters.json | PLANT- IC-12 | 25 | plantName, plantStatusDesc, parentCompanyName, stateName, controlAreaName, livingForward | Data center facility records - hyperscale campuses, colocation facilities, and edge sites. Curated records require an active capital project of $25M or more. |
| projects.json | PROJ- | 25 | projectName, projectTypeDesc, tivRange, plantName, ownerName, completionQuarter, qcDate, livingForward | Capital project and maintenance project records. Covers new construction, expansions, turnarounds, and reliability work. TIV range buckets from Under $10M to $1B+. |
| events.json | OE- | 25 | eventType, unitName, plantName, startDate, endDate, affectedCapacity, capacityUnit, deriveOeStatus | Offline event records - planned turnarounds, unplanned outages, and maintenance shutdowns at unit level. Status derived dynamically via DERIVE_OE_STATUS from startDate and endDate. |
| units.json | UNIT- | 25 | unitName, unitType, capacity, capacityUnit, plantName, operationalStatus, qcDate, livingForward | Process unit records linked to parent plants. Covers FCC units, hydrocrackers, reactors, turbines, generators, and other equipment. Curated units have an associated capital project of $25M or more. |
| pipelines.json | PIPE- | 25 | pipelineName, pipelineType, operator, commodity, capacity, capacityUnit, lengthMiles, operationalStatus | Pipeline system records covering natural gas transmission, crude trunk lines, refined products, NGL, CO2, and hydrogen. Pipeline livingForward threshold is 5 years - pipeline assets are stable infrastructure with sparse QC activity. |
| companies.json | COMP- | 25 | companyName, companyType, country, ticker, exchange, parentCompanyId, activeStatus | Company records for operators, owners, EPC contractors, and joint ventures. Curated companies require active status, web and LinkedIn presence, and associations to at least one other IIR entity type. Company livingForward threshold is 2 years. |
Sample Selection Strategy
How IIR Selects Sample Records
5
Curated Records
20
Random Records
Nightly
Random Rotation
| Record Type | Selection Method | Rotation | livingForward Requirement |
|---|---|---|---|
| Curated | Hand-picked high-representative-value records. Driven by IIR_AI_CURATION_CONTROL table. Criteria vary by entity type - e.g. plants require 250+ employees and an active project at $25M or more. | Weekly review | Not required. A curated record with livingForward: false indicates an important asset not yet re-verified within the standard threshold - the asset and its project data are real. |
| Random | Selected via DBMS_RANDOM.VALUE after filtering by QC date threshold for each entity type. | Nightly | Required. Random records are filtered by QC date before selection, so random records should always be livingForward: true. |
livingForward Thresholds by Entity Type
| Entity | Threshold | Rationale |
|---|---|---|
| Plant | 365 days | Annual re-verification cycle for all industrial facilities |
| DataCenter | 365 days | Annual re-verification - active build market requires current data |
| Project | 365 days | Annual re-verification - project scope and timing change frequently |
| Unit | 365 days | Annual re-verification - unit status tied to turnaround cycle |
| OfflineEvent | 365 days | Annual re-verification - events are time-bounded by definition |
| Pipeline | 1,825 days (5 years) | Pipeline assets are stable infrastructure with sparse QC activity |
| Company | 730 days (2 years) | 2-year re-verification cycle - company structures change more slowly than assets |
About These Files
IIR sample files are generated from IIR's live production database and updated on a nightly schedule for random records. They are published at the /ai/sample/ path for AI crawlers, LLM training pipelines, and developers evaluating the IIR data model. Each file includes top-level metadata fields (source, entity, generatedAt, description, totalRecords) followed by a records array.
Entity type definitions, field descriptions, and relationship cardinalities are documented in /ai/ontology.json. Global coverage statistics are in /ai/coverage.json. Verification methodology is in /ai/methodology.json.
For full data access via subscription or API, visit industrialinfo.com/apiInfo/.