Re: split, no repeat- Regular expression
- From: "William James" <w_a_x_man@xxxxxxxxx>
- Date: 31 Aug 2005 03:12:37 -0700
Nina wrote:
> I have a file, the content is like this:
>
> ATATTTGATTGGCCAGCCCTGCGTTTGCGGTTTTTTTTTGTTTTTTTATTTCCTGTATTTTTTTTGGGGGGGAAAAATTGCAGTTCCACGGA
> 4f-rnp Gene 204:267
> ACCTTATCGACTAGTATAAAAGGCACTGTCAGCTCTCCAGCCCGAACAAAATCGATCAAAATGCGCCCGCAATCAGCTGCGTGTCTATTACT
> 44D JMB 166:101
> ATGGGAGCGGTATGCTTAAATAGGGGCACCTTTTAATCCCTCTGGCCATTGGCAATCGATCCATTTAGTGGGAGCCATGTTCAAGTTGCTGG
> 44L JMB 166:101
> AACTTATGTAATCATATAGATTCTATAATAAACAAAGAAACAAAACTAGTTGTAAAACAAACACGATTCCTGTGTGTCATTGCGGGATATGG
> 74F EMBO 3:289
> TTTCCACACGATCGTGCTGCCTCCCAATAAACCCGGTGCAGTGAGTCAGTGTGTTGTGTGCCCCAGTCGCGAGCGGACGATCCGTGGAGATC
> Abdb EMBO 7:3223
> TGCGGATCAATTAAACCGTAAAAAACAGAGCAGGCGAGCGTAAGCAAGAGAGAGAGGTGAAGCCAGAGGCGGAGGCGCAAGACAAAGTGCAT
> abl p1 Oncogene 3:33
> AAAAAACAGAGCAGGCGAGCGTAAGCAAGAGAGAGAGGTGAAGCCAGAGGCGGAGGCGCAAGACAAAGTGCATTTTCAGGGCGTGTTTTTGA
> abl p2 Oncogene 3:33
> TAATAGTCGCTCAAAAGCTGTCGAGAGAGAGGGAGAGAAAAGAGAGAGTGAAAGCATAGTCCCGCTATTTTGCCGAGAGAAATAAAGAGCAG
> ace JMB 210:15
>
> for example, the first sequence, what I want is after sequence: 4f-rnp;
> AND then collect all this name to a new file.
> so the new file is like:
> 4f-rnp
> 44D
> 44L JMB
> 74F
> Abdb
> abl
> *here I don;t want another alb, so the output should not be repeated.*
> ace
awk 'NF > 1 && 1 == ++a[$1] { print $1 }' datafile
.
- Follow-Ups:
- Re: split, no repeat- Regular expression
- From: Nina
- Re: split, no repeat- Regular expression
- References:
- split, no repeat- Regular expression
- From: Nina
- split, no repeat- Regular expression
- Prev by Date: Re: split, no repeat- Regular expression
- Next by Date: Re: split, no repeat- Regular expression
- Previous by thread: Re: split, no repeat- Regular expression
- Next by thread: Re: split, no repeat- Regular expression
- Index(es):
Relevant Pages
|