Python package: WGDpseudogenes
1. Identification of all remnants of pseudogenes
Use multiplicons form plaza to link the different chromosomal regions. (Proost et al. 2015)
Input from plaza: Segments (Dated), annotation, duplication type(Tandem, anchor points(dated), proteome
• Get anchor points of the most recent wgd Ks 01.-0.6
• Segments for those anchor points
o ad_segments (species=ptr, exp_id=1) -> first and last gene of the segments
• Annotation of the segments
o coordinates of first and last gene -> coordinates of segments
o get all genes on the segment
o mark Anchor point and tandem duplicates
• Create pseudoalignment between the segments
o based on anchor points
• Define potential regions
o Genes that have no counterpart on the aligned segment
• Extract the potential regions form the genome
o Region between the proceeding and following anchor point
• tblastn of the opposing gene on the potential region
• Filter blast results
o Percent identity >30%
o Evalue<0.5
o Length >50 amino acids
• Collapse and link fragments:
o Different fragments of the gene put together
o Fragments are like exons
• extra annotations
The output contains regions with anchor points, retained duplicates, and also regions without genes. In these last regions potential to contain remnants of duplicates.
2. Analyses of Blast results => Remnant regions?
• Annotate the features of the regions
o RNA seq coverage
o Repeat elements between the different fragments of a gene
 Can still be further annotated to TE’s
o Retention group the query gene: Single/Intermediate/multi/Non-core (Plant cell paper)
• Other things that can be added (TO DO)
o Mutation rate
 Compare mutation rate with other regions/genes
 Mutation hotspots with sliding window
 SNP’s
o Remnant gene structure => annotation
 Pfam
o Crossing out of parts of the genome
 Loss due to mutations or outcrossing?
