Feature Extraction from Protein Sequence (FEPS)

FEPS (Feature Extraction from Protein Sequence) webserver, a comprehensive web-based feature extraction tool, computes the most common sequence-driven features that are incorporated in 7 feature groups giving rise to 48 feature extraction methods. There are altogether 2765 descriptors that can be calculated using FEPS. The extracted features along with machine learning techniques such as SVM, Random Forest, and K-Nearest Neighbors can be used in various classification problems in bioinformatics like protein function prediction, protein classification, protein structure prediction, protein localization prediction, and others.
Input: Protein fasta-formatted sequence file(s)

The input to the webserver is a fasta-formatted protein sequence file. In a typical classification scenario, you may have protein sequences for different groups (download the tutorial). The sequences belonging to the same group are saved together in a single multiple-sequence fasta-formatted file. The input sequences have to meet following guidelines:

  • The sequences must be valid protein sequences
  • The sequences must be in fasta format
  • The sequences of the same group are saved in one file
  • The file name can represent the group name

Protein sequence file list

Feature types
The features are divided into 7 types. Each one contains different feature types. Select a feature type and then select corresponding feature type options from the drop-down menu.
Amino Acid Composition
Composition, transition and distribution
Autocorrelation Descriptors
Pseudo Amino Acid Composition
Shannon entropy descriptors
Other descriptors

Feature type options

Some feature types have options (see the supporting document). You may use the default options or choose options that you want. Moreover, please bear in mind that whenever 'ID Number' is an option, you can select one out of 544 Amino Acid Physicochemical properties from the drop-down menu or enter ID number to specify the amino acid physicochemical properties.



Maximum lag:

Select a distance matrix:

Select a property or enter an Amino Acid index ID Physicochemical properties (544):

ID Number:

Amino Acid property:
Enter your AAP


Output file format

You can choose one or more file formats. The following are the most common feature file formats accepted by machine learning packages (e.g. weka, svm-light). Whenever, the input file includes the sequences of a protein group, the last column of the output file represents the class labels.

Comma separated value (CSV) file
SVM-light file
Weka format file
Tab delimted text file

For your convenience, the output files can be forwarded to your email if it is provided.

Your e-mail (optional):