Genomalysis

Introduction


Genomalysis is a Java application currently implemented for Windows that allows users to perform data mining and viewing operations on the proteomes and Genomes of various species. The project aims to provide a rich graphical user interface with which end users can mine for and analyze sequences of interest. Currently, Genomalysis can open, parse, and perform mining functions on files containing genomic and proteomic data in FASTA format. The project aims to provide an extensible mechanism to do bulk processing on sequence data in order to facilitate gene discovery and characterization efforts. The idea for Genomalysis was thought up by Benjamin Patterson, a master's level biologist who studied at Humboldt State University. Some of the research that was facilitated by Genomalysis is mentioned in his master's thesis (PDF 1MB). The original implementation of code in Java/Swing was done by Wolfgang Meyers.

Data mining


Currently in Genomalysis, data mining consists of selecting and configuring a set of sequence filters that will be applied to sequences in an input file. When the filters are executed, sequences are tested against each filter in turn, and receive a pass/fail response at each step. Sequences that receive a pass response from all filters are written to an output file. Here is a list of sequence filters that are currently implemented in Genomalysis:

Sequence Viewing


Currently, Genomalysis can be used to view sequences that are contained in a FASTA formatted file. When such a file is opened, Genomalysis will display the number of sequences that the file contains and give you a list of the contained sequences with with you can select individual sequences for viewing.