Investigate the function/s, if any, of the ORFan gene sequences identified in previous studies from the literature, with a special focus of Clamp at el study. However, the initial goal is to find out all the orphan genes of the human genome.
Human Genome GRCh38.p7 was used for the analysis.
A bioinformatics pipeline was developed using R language with BiomaRt package to filter genes. BiomaRt is the official filtering tool to query ensemble databases. There were six main steps in the filtering process:
- Removal of Retrotransposons/Pseudogenes
- Removal of orthologous genes with Dog
- Removal of orthologous genes with Mouse
- Removal of paralogous genes within Human
- Removal of known pfam genes
- Removal of genes that absent the protein sequence (i.e. non-coding genes)