ClinEFf

Clinical Variant Annotations Software

ClinEf is a professional version of the SnpEff and SnpSift packages, suitable for production in clincal labs. ClinEff combines the flexibility of multiple SnpEff/SnpSift commands with simplicity of running one program to perform all the annotations at once (i.e. in a single pass). It is highly customizable and can be taylored to specific pipeline needs in Clinical production environments.

Why ClinEff?

ClinEff is a professional version of the well known
SnpEff + SnpSift annotation suite.
ClinEff is designed for clinical sequencing environments

Simple, fast, robust and reliable variant annotation, prioritization and reporting for whole genome sequencing, exome sequencing or gene panels.


  • Based on the leading variant annotation packages SnpEff & SnpSift.
  • Optimized for clinical labs, helps to deploy standardized workflows.
  • Effect prediction, prioritization & classification.
  • Cancer somatic vs. germline effects.
  • Standards: VCF, Sequence Ontology, HGVS, 'ANN' VCF fields.
  • Multiple sample support.
  • Annotations are applied based upon your pre-configured settings.
  • Supports many state of the art databases.

Why a professional license?

A professional license facilitates customization and compliance

ClinEff's professional license simplifies deployments on clinical environments.


  • Reproducibility for CLIA and CAP certified analysis. Versioned databases.
  • Long Term Support for software and database: 2 years (optionally more)
  • Support for regulatory compliance
  • Customized data input / output formats and connectors to third party systems
  • Prioritized bug fixes
  • Prioritized feature development
  • Bug-fixes for Long Term Support versions
  • Customized genome references and databases (e.g. targeted sequencing)
  • Integration with open data bases
  • Integration with private or proprietary databases
  • Customizable clinical report and formats
  • Privacy: Tickets, issues, pipeline-specific analysis and feature requests are completely private.

Download

ClinEff needs at least one of the genomic databases and a license

ClinEff Version Databases v37 Databases v38
ClinEff v1.0g hg19 / GRCh37 hg38 / GRCh38
ClinEff v1.0f
ClinEff v1.0e
ClinEff v1.0d
ClinEff v1.0c
ClinEff v1.0

License

License type Price
Academic / Non-Profit Research Free: Request license
Clinical / Professional $99.00 / Month + $1 per sample processed
(taxes not included)
If you experience any technical difficulties trying to purchase a license, please contact us

Documentation

The following sections describe some technical details about ClinEff

Requirements

  • Java 1.8 or higher.
  • Memory: At least 6GB of RAM, 8GB or more are highly recommended.
  • Operating system: It runs on any operating system that can run Java. Unix based operating systems (such as Linux or OS.X) are highly recommended.

Installing ClinEff

Don't like to install programs?
We can do it for you.
Contact us for a Cloud instance or a custom install.

Installing ClinEff simply requires to uncompress the downloaded files on your $HOME directory:

$ cd                            # Go to your home dir
$ tar -xzf clinEff_v*.tgz       # Uncompress the main program (note that version number will change)
$ cd clinEff					# Go to the newly created 'clinEff' directory
$ tar -xvf clinEff_db37_v*.tgz  # Uncompress the databases package (use clinEff_db38_v*.tgz if you want GRCh38)
# You also need to install the license files
$ cd path/to/you/license files
$ cp clineff.license* clinEff/  # Copy you license to the directory

Running ClinEff

Don't like running programs?
We can do it for you.
Contact us for ClinEff as a service.

In order to run ClinEff, we need to provide some memory options to Java. We recommend using at least 6GB of RAM (-Xmx6G). So a basic command line is:

java -Xmx6g -jar clinEff.jar
Running the command without any options or files, will show a basic help screen:
$ java -Xmx6g -jar clinEff.jar
ClinEff version ClinEff 1.0 (build 2016-12-04 12:00), by Pablo Cingolani
Usage: ClinEff [command] [options] genome [file.vcf]

Available command line options:
-c , -config <file>          : Specify config file. Default: clinEff.config
-db <file.vcf>               : Add annotations from database 'file.vcf'
-d , -debug                  : Debug mode (very verbose)
-h , -help                   : Show this help and exit
-l , -license <file>         : Path to license files
-v , -verbose                : Verbose mode
-version                     : Show version number and exit
-w , -workflow <file>        : Workflow file. Default: workflow.config
Some command line options can be specified either using a short or a long format,

Example run

$ java -Xmx8G -jar ClinEff.jar -v GRCh37.75 sample_1KG.vcf.gz > sample_1KG.ann.vcf
ClinEff version ClinEff 1.0 (build 2016-12-10 18:22), by Pablo Cingolani
00:00:00	Reading config file: clinEff.config
00:00:00	Reading workflow file: workflow.config
00:00:00	Adding annotation database (VCF): 'db/GRCh37/clinVar/clinVar.20161129.vcf'
00:00:00	Adding annotation database (VCF): 'db/GRCh37/dbSnp/dbSnp_v149.20161122.vcf.gz'
00:00:00	Adding annotation module : 'gwasCatalog'
00:00:00	Adding annotation module : 'dbNsfp'
00:00:00	Adding filter 'AF_1KG': ((exists dbNSFP_1000Gp1_AF) && (dbNSFP_1000Gp1_AF >= 0.05))
00:00:00	Adding filter 'AF_ESP': ((exists dbNSFP_ESP6500_AA_AF) && (dbNSFP_ESP6500_AA_AF >= 0.05)) || ((exists dbNSFP_ESP6500_EA_AF ) && (dbNSFP_ESP6500_EA_AF  >= 0.05))
00:00:00	Adding filter 'AF_EXAC': ((exists dbNSFP_ExAC_AF) && (dbNSFP_ExAC_AF >= 0.05))
00:00:00	Adding SnpEff annotations
00:00:00	Adding annotation module : 'com.clineff.report.ReportLof'
00:00:00	Adding annotation module : 'com.clineff.report.ReportHighImpact'
00:00:00	Adding annotation module : 'com.clineff.report.ReportClinical'
00:00:00	License file 'clinEff.license' OK
00:00:00	Reading database for genome version 'GRCh37.75' from file 'data/GRCh37.75/snpEffectPredictor.bin' (this might take a while)
00:00:22	done
...
00:01:55	Logging

Databases

Need help customizing your database?
We can do it for you.
Contact us for a database customization service.

ClinEff's provides human genome databases for clinical applications. Note that databases names are sometimes called genome in the command line argument.
There are several databases provided:

Database / Genome name Genome version Information source
hg19 hg19 / GRCh37 UCSC RefSeq
GRCh37.75 hg19 / GRCh37 ENSEMBL
hg19kg hg19 / GRCh37 UCSC KnownGenes
GRCh37.p13.RefSeq hg19 / GRCh37 NCBI RefSeq
hg38 hg38 / GRCh38 UCSC RefSeq
GRCh38.86 hg38 / GRCh38 ENSEMBL
hg38kg hg38 / GRCh38 UCSC KnownGenes
GRCh38.p7.RefSeq hg38 / GRCh38 NCBI RefSeq

Workflows

Need help customizing your workflow?
We can do it for you.
Contact us for a workflow customization service.

In a nutshell, ClinEff's takes an input VCF file and applies a series of 'annotations modules' Conceptually, each annotation module is similar to using a single SnpEff / SnpSift command line. Instead of applying several independent tools, ClinEff optimizes the workflow by applying them in one step and defining then in a single 'workflow' file (as opposed as running several command lines glued toghether in a shell script). This improves efficiency, repeatability and clinical compliance.

ClinEff is customized using workflow definition files. The default workflow file is workflow.config in ClinEff's install directory, but an alternative file can be defined using the -w command line option. Workflow definition files define which annotation steps are used in ClinEff's annotation process. These annotation steps are known as 'annotation modules' or simply 'modules'. In the following paragraphs, we define the components fo a workflow file.

Annotation modules: This section define the annotation modules applied. It is a comma separated list of modules which can be either modules names, or java class names can be used. Functional annotations ('ANN') annotations are always included, so there is no need to include them name in this section.
In this example, we only use gwasCatalog and dbNsfp annotation modules:

modules.annotation : gwasCatalog, dbNsfp
Available modules
Module name Corresponding SnpEff / SnpSift command Module annotations
ann SnpEff eff / ann Functional annotations, protein changes, putative impact and loss of function prediction
annotate SnpSift annotate Annotation using a database file (e.g. custom VCF files)
caseControl SnpSift caseControl Compare how many variants are in 'case' and in 'control' groups; calculate p-values. This is for VCFs having many samples (cohort analysis)
dbNsfp SnpSift dbNsfp Annotate with multiple entries from dbNSFP. These annotations include Uniprot, Interpro, SIFT, Polyphen2, LRT, MutationTaster, FATHMM, MetaSVM, VEST3, PROVEAN, CADD, GERP++, phyloP46way, phastCons46way, SiPhy, LRT, 1000Gp1, ESP6500, ARIC5606, ExAC, COSMIC, etc.
filter SnpSift filter Filter variants based on arbitrary expression
filterChrPos SnpSift filterChrPos Filter variants by genomic coordinates (i.e. chr:pos)
filterGt SnpSift filterGt Filter genotype using arbitrary expressions.
geneSets SnpSift geneSets Annotate GeneSet using MSigDb gene sets (MSigDb includes: GO, KEGG, Reactome, BioCarta, etc.)
gwasCatalog SnpSift gwasCatalog Annotate using GWAS catalog
hwe SnpSift hwe Calculate Hardy-Weimberg parameters and perform a goodness of fit test
intervals SnpSift intervals Filter variants by genomic intervals with intervals.
intervalsIdx SnpSift intervalsIndex Filter variants by genomic intervals with intervals. Index-based method: Used for large VCF file and a few intervals to retrieve
op SnpSift vcfOperator Create a new field based on operations from other fields (e.g. get the maximum of two fields).
private SnpSift private Annotate if a variant is private to a family or group (multi-sample VCF files with family tree structure information).
rmInfo SnpSift rmInfo Delete an INFO field
rmRef SnpSift removeReferenceGenotypes Delete reference alleles
varType SnpSift varType Annotate variant type (e.g. SNP, INS, DEL)

Annotation modules arguments: This sections define additional command line arguments applied to each annotation module. The format is args.MODULE_NAME : arg1 arg2 ... argN. For instance if we want to have functional annotations only in canonical trancripts and no up/downstream annotations, we could add the following line in our workflow:

args.ann : -canon -ud 0
Similarly, you can define parameters for other annotation modules (e.g. args.dbNsfp: ... ).

Module-specific parameters: Some annotation modules almost always require parameters. Instead if using the generic args.MODULE: ... workflow directive, these can be configured using module specific entries.

Entry name Module Configuration Format
database.dbnsfp dbNsfp Path to dbNsfp database path to a valid dbNsfp 'database' file. The file must be bgzip compressed and tabix-indexed)
database.gwascatalog gwasCatalog Path dtabase file Path to a valis Gwas-Catalog file
dbnsfp.fields dbNsfp Defines dbNsfp fields to annotate Comma separated list of fields to use for dbNSFP annotations (no spaces or tabs allowed in this list)

VCF database annotations: Often we need to annotate using VCF databases. A typical example is to use dbSnp and ClinVar. This can be defined using the annotation.db.vcf section, e.g.:

annotation.db.vcf : db/GRCh37/clinVar/clinVar.20161129.vcf \
                        , db/GRCh37/dbSnp/dbSnp_v147.20160601.vcf.gz
The database files must be either compressed using bgzip (and have a tabix index) or be in plain VCF (no compression).

Filters: Filtering variants is a common step in many clinical processing environments. A ClinEff workflow can define many filters and allows to easily add and remove them.

Filters act by adding a 'filterName' to the FILTER VCF column. Remember that in VCF jargon, a variants passes filters if the FILTER entry is empty ('.') or has a PASS tag. This means that when a 'filterName' is added to the FILTER VCF column, you are actually excluding the variant in downstream analysis.

Filters are defined using filter.FILTER_NAME followed by an arbitrary expression (a valid SnpSift filter expression). When the expression is satisfied, FILTER_NAME tag is added to the FILTER VCF column.
Here is a filter definition example:

# This filter has a filterName 'AF_1KG'
# The filter is 'true' if the allele frequency from 1000 Genomes Project is more than 0.05 (i.e. 5%)
# Note that if the filter is true, it will add a tag 'AF_1KG' to the FILTER VCF file

filter.AF_1KG  : ((exists dbNSFP_1000Gp1_AF) && (dbNSFP_1000Gp1_AF >= 0.05))
We can have many filter definitions (they should use different 'filterName'). In order to enable a filter for a workflow, we need to add it to the filters list. Note that filters that are not added to the filters list are ignored. So, for activating the filter in our previous example, we need to add the following line:
filters : AF_1KG
The filters entry can be a comma separated list of 'filterNames'.

Reporting modules This sections define the reporting modules. These are Java classes that can be customized for your reporting needs.
Contact us if you need to develop specific reports for your workflow. The format is a comma separated list of report classes, e.g.:

modules.report : com.clineff.report.ReportLof \
                , com.clineff.report.ReportHighImpact \
                , com.clineff.report.ReportClinical

Workflows annotation steps

Need help customizing your workflow?
We can do it for you.
Contact us for a workflow customization service.

ClinEff annotation workflows can be highly customized, but all workflows follow the following steps:

  1. Functional annotations: Workflow entry args.ann.
  2. VCF database annotations: Workflow entry annotation.db.vcf.
  3. Annotation modules: Workflow entries modules.annotation and args.MODULE, and module-specific entries.
  4. Annotation filters: Workflow entries filters and fitler.FILTER_NAME entries.

Solutions for the precision medicine era.