From Voror_Wiki

(Redirected from Discovery Query)

Discovery query is an application that supports the defining of data sets based on query of data within Discovery. In this article only the simple query application is specified at this time

Simple Query

The basic provision supports a simple 'generic' query service, which is an extract and report service provided by the Discovery Data Service.

The service is designed to enable rapid and frequent extracts of small or large data sets from the Discovery Data Service, where the definition of those data sets conform to a simple pattern.

The application is a query editor and runner user interface which presents a set of constrained options that enables a user with basic informatics skills to define simple but highly valuable reports from the Discovery analytical databases.

The simple query service is a semi automated process involving the  extract of a set of files or tables from the data service. The content of the files should cover the source data requirements of a subscriber to the Discovery Data service. Files or tables may be aggregate counts or line (event level) entries derived from health records.

The content of the extracts are governed by the Discovery data processing agreements (DPA) , the Discovery data sharing agreements (DSA) and the Discovery project definitions (DPC). 

The generic query service is not suitable for advanced analytical rule based data sets, but instead provides either simple outputs, or extensive data sets that can then be subsequently analysed in depth. For subscribers seeking sophisticated rule based outputs, the standard Discovery Data Distribution service, or reporting service should be used.

The following description will indicate whether or not the simple generic query service should be used for a particular purpose.

The service is initiated by a data access agreement, accompanied  by a data set definition as provided by a subscriber, with the data set definition being constrained to the generic query service's simple query parameters. In other words, the service is used when the parameters of a data set definition are in a simple form.

Simple data set definition

The configuration parameters of a simple data set definition fall into two parts:

Input parameters, which include the defining of a cohort, with criteria which include the selection of simple broad types of data from the Discovery common information model, and the filtering of data entries by single attribute value sets. 

Output parameters, which include whether the output is event level (e.g. 1 line per event and 1 line per patient), or aggregate, where the aggregation is by one common attribute type such as multi-level organisational grouping, personal demographic grouping, or event level date time grouping.

Cohort definition

The simple generic query service supports simple definitions of cohorts of patients. The following configuration options are supported i.e. one or more of the following:

Provider organisations List of organisations, or list of organisations groups, whose records will be used as the base cohort. An example may be all GP practices in London, or a list of a small number of practices in Tower hamlets. This MUST be present

Included organisations This is the list of organisations whose records are included for the patients who are in the base cohort, but not included if they are not. For example, patient who have records in a GP practice may be in the base, and they may have visited hospital and the hospital may be included in the included organisation list. If not defined NO other organisational records are included.

Patient GP registration status Whether the cohort is limited to currently registered patients (if the organisations are practices), or whether the cohort may include patients who have left or died.

One Value set per entity type Patients that have entries containing concepts that are part of a value set,  i.e. one set of concepts, either built as a "combined set of sets" or a single set

One demographic, One age range or gender list, or a list of high level post codes for the patient's main home residence


Input criteria

Input criteria includes a collection of  ONE or more items, each of which includes one or more, high level types of entities from the information model (listed below) and one type of filter applied to each.

The following table illustrates available entities

Data entity type Attribute value sets/ranges
Person Age range in years, months or days / Gender male/ female /other, list of high level post codes e.g. LS24, NW1
Observations (including rconditions, values, procedures, family history, referrals, allergies

All observations of an inclusion Observation type value set i.e a value set whose definition is known to the information model. Example may be diabetics, all diseases, or an exclusion value set (e.g. a list of sensitive codes)

Relative date range. A date range of observations relative to the day of extract, in days , weeks, or years

Medication orders or issues

Medicine type value set whose definition is known to DDS information model e.g. oral non steroidal anti-inflammatory drugs

Relative date range. A date range of observations relative to the day of extract, in days , weeks, or years

Encounters, including consultations, admissions, discharges, outpatient attendances

Optionally an Encounter type value set known to the DDS information model e.g. hospital encountes, real GP consultations

Relative date range. A date range of observations relative to the day of extract, in days , weeks, or years




Output definition

The following table illustrates the types of output and parameters associated with each type


Output type Attribute value sets/ranges

Counts which can br broken down by ONE of the following

1. Organisational grouping e.g. by STP and by CCG and by practice

2. Date range groupings for the observation or medication or encounter tables e.g. daily weekly yearly. 

3. High level post codes

4. Age

Event level

For each observation, medication or encounter table, event selection from the following attributes

1. Patient ID - This is a table level identifier that links the tables to thesame patient

2. Patient NHS number - or pseudonymised NHS number which is an alternative link to 1.

3. Effective date

4. Concept e.g Anginal, systolic blood pressure, Atenolol 50 mg

5. Owning organisation (the practice or organisation responsible for this entry)

6. Numeric value where present

And for the patient level

1. Post code

2. Age (approx)

3. Gender (approx)

4. Registered organisation

5. Death status


Extract Schedule

Subscribers can set an extract schedule being either, daily, weekly, monthly (date of each month) or quarterly, or none


Delivery mechanisms

Subscribers must set one of the following delivery mechanisms and have the right levels of infrastructure in place to receive and process the data.

Output type Description
NHS email

This is suitable for aggregate reports or smallish  data sets suitable for email attachments 

Encrypted cloud drive

A cloud drive such as Google or one drive where the data resides temporarily by agreement and made available to specific named person and is downloaded with single factor authentication to unzip the files

Secure ftp This is the mechanism required for large or regular scheduled data sets and requires the subscriber to use an "ftp client" to initiate the download from the Data service's ftp server
Sextant database Sends the data and automatically files into a data base conforming to the compass database schema (mysql MSSQL or other)