This work was supported by a grant from The National Natural Science Foundation of China (30660044)
对线虫核糖核蛋白基因内含子序列与相应编码序列采用Smith-Waterman方法做局域比对分析,探讨两者之间的相互作用机制.发现内含子中部序列确实存在与相应编码序列的相互作用区域.第一内含子的最佳匹配分布在内含子15%~55%的区域内,第二内含子的最佳匹配分布在内含子30%~80%的区域内.对于长内含子,在与外显子序列比对时,最佳匹配分布在内含子5%~20% 区域内,在与整个编码序列比对时,出现了两个峰区,一个位于内含子15%~30%区域内,另一个位于内含子54%~78%区域内.推测第一个峰区与外显子内部序列有关,第二个峰区与外显子-外显子结合区域的序列有关.还发现编码序列上存在多个与内含子序列的相互作用域和一些禁配区域分布.推测这些禁配区域与蛋白质结合区域有关.结论印证了内含子序列与相应编码序列协同进化的观点.
Intron as a kind of non-coding DNA is rich in eukaryote genomes. The functions and involution mechanisms are not very clear besides the splicing. It was thought that introns play a very important role in maintaining and regulating the functional mRNA structure after splicing in the process of mRNA export and translation elongation, etc. Moreover, intron sequence and its corresponding coding sequence are existed interaction or co-evolution relations. The relations between intron sequences and its corresponding coding sequences were studied. For the C. elegans ribosomal protein genes, 85 genes were selected from RPG (http://www.cbi.pku.edu.cn/chinese/mirrors.html). The intron sequences were divided into first introns, second introns, other introns, short introns, and long introns and the corresponding coding sequences were divided into exons and all protein coding sequences (CDS), then the matching local alignment between introns and the corresponding coding sequences were done with Smith-Waterman local alignment software. The results show that there are really the interaction regions in introns when it is aligned with coding sequences. When intron sequences are aligned with CDSs, the significant interaction regions for the first intron and the other intron are located in about 15%~55% of intron length and it is located in about 30%~80% of intron length for the second intron. The distribution of interaction regions for short introns is similar to the distribution of the first introns. For long introns, there are two significant interaction regions. The first peak region is located about 15%~30% of intron sequence and the second peak region is located about 54%~78% of intron sequence. When long introns are aligned with exons, there is only one peak region. It is located in about 5%~20% of intron upstream region. When CDS are aligned with every kind of introns, it was found that there are many interaction regions and forbidden regions in CDSs. It was also found that there are two common forbidden regions in the CDSs, they are located at the 10% and 80% of coding sequence. The distribution of interaction regions for the first introns is different from the second introns. When compared the distributions of long introns aligned with CDS and aligned with exons, it can be concluded that the segment of the first peak region are acted on the inner exon segment, the segment of the second peak region are acted mainly on the exon-exon junction regions. Furthermore, there are many peak regions and forbidden regions which are distributed in protein coding sequences. It is speculated that the forbidden regions may be the combined regions of protein complex. In a word, all of the intron sequences besides the 5' end and 3' end correlate closely with their corresponding coding sequences or the two kinds of sequence segments are existed co-evolution relation.
赵小庆,李宏,包通拉嘎.线虫核糖核蛋白基因内含子与相应编码序列的相互作用[J].生物化学与生物物理进展,2010,37(9):1006-1015
复制生物化学与生物物理进展 ® 2024 版权所有 ICP:京ICP备05023138号-1 京公网安备 11010502031771号