Software - High-throughput Genomics & Systems Biology

Software

ORNA

ORNA is a read normalization (reduction) software. That allows to remove redudancy in large NGS datasets, with as little effect as possible on downstream assembly approaches. See the project page for more info.
DA Durai, MH Schulz
In-silico read normalization with set multicover optimization,
presented at Recomb-seq [preprint]

TEPIC

TEPIC is a software package for integrative analysis of open-chromatin datasets for TF binding affinity prediction. It can be combined with gene annotation to provide gene-level estimates of TF regulation. See the project page for more info.
F Schmidt , N Gasparoni, G Gasparoni, K Gianmoena, C Cadenas, JK Polansky, P Ebert, KJV Nordström, M Barann, A Sinha, S Fröhler, J Xiong, A Dehghani Amirabad, F Behjati Ardakani, B Hutter, G Zipprich, B Felder, E Jürgen Eils, B Brors, W Chen, JG Hengstler, A Hamann, T Lengauer, P Rosenstiel, J Walter, MH Schulz
Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction,
Nucleic Acids Research, 29 November 2016, full text

KREATION

KREATION is meta-software that enables parameter selection for de novo transcriptome assemblers. It also has a missassembly filter that can be used to improve the quality of assemblies. See the project page for more info.

D Durai, MH Schulz
Informed kmer selection for de novo transcriptome assembly
Bioinformatics, 2016 [pdf]

DREM 2.0

DREM is a input/output HMM based method to reconstruct dynamic maps of TF regulation using expression time series data. The software is written in Java and has a rich user interface for network display and downstream analyses. See the project page for more info.

MH Schulz*, WE Devanny*, A Gitter, S Zhong, J Ernst and Z Bar-Joseph
DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data
BMC Systems Biology [full text]

SEECER

SEquencing Error CorrEction for RNA sequencing data (SEECER) is a read error correction method designed for non-uniform sequencing data sets, in particular RNA-seq. It clusters reads and reconstructs profile HMMs to correct indel and substitution errors with high sensitivity. See the project page for more info.
H Le, MH Schulz*, BM MCcauley, V Hinman, and Z Bar-Joseph
Probabilistic error correction for RNA sequencing
Nucleic Acids Research [full text]

Oases

Oases is a de novo transcriptome assembler that exploits similarities between de Bruijn graphs and splicing graphs to reconstructs full length mRNAs from RNA-Seq data with high sensitivity and is one of the most widely used methods in its class. See the project page for more info.

MH Schulz*,DR Zerbino*, M Vingron and E Birney
Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels
Bioinformatics 28 (8): 1086-1092 [full text]

Phenomizer

The Phenomizer is an online webservice for clinical diagnosis. The physician can can query the database to get predictions of the best matching disease for a set of observed phenotypes. Try it out here.

S Köhler, MH Schulz, P Krawitz, S Bauer, S Doelken, CE Ott, C Mundlos, D Horn, S Mundlos and PN Robinson
Clinical Diagnostics with Semantic Similarity Searches in Ontologies
The American Journal of Human Genetics, 85 (4):457-64 [full text]

Fiona SeqAn logo

Fiona is a genome read error correction method that corrects indel and substitution errors using several word lengths. It automatically adapts the parameters to the data and supports OpenMP parallelization. See the project page for more info.

MH Schulz^,* ,D Weese*, M Holtgrewe*, V Dimitrova,S Niu, K Reinert, H Richard^,*
Fiona: a parallel and automatic strategy for read error correction
Bioinformatics 17 (30): i356-i363, ECCB 2014 proceedings [full text]

SplazerS SeqAn logo

SplazerS is a split-read indel alignment method with high sensitivity that works for single and paired-end reads. See the project page for more info.
AK Emde, MH Schulz, D Weese, R and Sun, M Vingron, VM Kalscheuer, SA Haas and K Reinert
Detecting genomic indel variants with exact breakpoints in single-and paired-end sequencing data using SplazerS
Bioinformatics 28 (5): 656-663. [full text]

ALF SeqAn logo

ALF is a toolbox of alignment-free sequence similarity measures, including a new measure, N2, that includes mismatches and the reverse complement of words. We have shown that N2 outperforms other common alignment-free measures in performance and runtime, it was desinged to accomodate inexact overlaps as in degenerate binding sites or reads with sequencing errors. See the project page for more info.
J Göke, MH Schulz, J Lasserre and M Vingron
Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with Word Neighbourhood Counts
Bioinformatics 28 (5): 656-663 [full text]