workflow

1. get CLIP-seq data

1.1 download from http://lulab.life.tsinghua.edu.cn/postar/download.php

1.2 get from /BioII/lulab_b/shared/projects/POSTAR2/CLIP_seq/bed

human_RBP_eCLIP_hg38.txt

chr1    187044  187087  human_RBP_eCLIP_ENCODE_1        0       -       AARS    eCLIP   K562    ENCODE  3.02982873699729
chr1    267418  267466  human_RBP_eCLIP_ENCODE_2        0       -       AARS    eCLIP   K562    ENCODE  5.50800008227284
chr1    630751  630792  human_RBP_eCLIP_ENCODE_3        0       -       AARS    eCLIP   K562    ENCODE  3.5454755693998

explanation for each column

column1:chromosome
column2:peak start 
column3:peak end
column4:name
column5:0
column6:strand
column7:RBP name
column8:CLIP-seq technology, peak calling method
column9:cell line or tissue
column10:data accession
column11:score(Piranha score: Peak heights from the CLIP-seq data; PARalyzer score: T-to-C transition ratios ranging from 0 to 1, ratios greater than 0.5 indicate protein-binding while less than 0.5 indicate without protein-binding; CIMS score: Mismatch peak heights from the CLIP-seq data; CITS score: Truncated peak heights from the CLIP-seq data; eCLIP score: -log10 P-value, the threshold is 3.)

2. prepare interested bed and background bed

2.1 interested bed

up.bed

chr1    1167104 1167134 ENSG00000207730.3__1167104__1167134__1  .       +
chr1    1167119 1167149 ENSG00000207730.3__1167119__1167149__2  .       +
chr1    1167134 1167164 ENSG00000207730.3__1167134__1167164__3  .       +

2.2 prepare background bed from interested bed

the same as method in sequence_motif (1) downstream/upstream 1000bp (2) shuffle the input sequence (3) shuffle the genome sequence

3. overlap between CLIP-seq peaks and interested/background bed

use intersectBed (bedtools) command to get the overlaps of CLIP-seq peaks

intersectBed -wa -wb -s -a up.bed -b human_RBP_eCLIP_hg38.txt >up.eCLIP.hg38.txt
intersectBed -wa -wb -s -a up_background.bed -b human_RBP_eCLIP_hg38.txt >up_background.eCLIP.hg38.txt

4. compare RBP enrichment

4.1 percentage of region with CLIP-seq peaks

cut -f 4 up.bed |sort -u|wc -l
1685
cut -f 4 up.eCLIP.hg38.txt |sort -u|wc -l
663
cut -f 4 up_background.bed |sort -u|wc -l
1685
cut -f 4 up_background.eCLIP.hg38.txt |sort -u|wc -l
544

4.2 RBP enrichment in each region

cat <(cut -f 4,13 up.eCLIP.hg38.txt |sort -u|cut -f 1|sort|uniq -c|sed 's/^[ \t]*//g'|sed 's/ /\t/g'|awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,"up"}') <(cut -f 4,13 up_background.eCLIP.hg38.txt |sort -u|cut -f 1|sort|uniq -c|sed 's/^[ \t]*//g'|sed 's/ /\t/g'|awk 'BEGIN{FS="\t";OFS="\t"}{print $1,$2,"up_bg"}')|sed '1i density\tname\tclass' >RBP_enrichment.txt

screen short of “RBP_enrichment.txt”

density name    class
1       ENSG00000004897.11__47141883__47141913__96      up
2       ENSG00000104723.20__15540418__15540448__14      up
2       ENSG00000115758.12__10441554__10441584__100     up

boxplot of RBP enrichment

example