Ahoj,
uz delsi dobu se chci zeptat zda mate nejake zkusenosti, pripadne napady co by se dalo delat s raw daty z analyzy od 23andme. Podle filozofie ze DNA data patri tomu kdo je po tomhle svete nosi, mam jako jejich zakaznik k dispozici dump v textove podobe (priklad formatu viz nize).
Mam jeden dump z roku 2011 (human assembly build 36) a druhy z 2014 (build 37), kazdy ma temer milion radku, prekvapilo me ze se lisi temer ve vsech radcich:
ruza@azur:~/tmp/23andme/$ wc -l *txt 1927398 sorted.diff.txt 966992 genome_Pavel_Ruzicka_Full_20110125063125.txt 960629 genome_Pavel_Ruzicka_Full_20141220071038.txt
Nekde jsem kdysi videl seznam alternativnich software ktere tenhle format umi cist, nektere byly snad i open source .. nez abych zkousel vsechno co najdu na internetu (Promethease je treba placena sluzba) necham si neco doporucit, pripadne vas napadne neco co mne ne? ;)
ruza
# This data file generated by 23andMe at: Sat Dec 20 07:10:38 2014 # # Below is a text version of your data. Fields are TAB-separated # Each line corresponds to a single SNP. For each SNP, we provide its identifier # (an rsid or an internal id), its location on the reference human genome, and the # genotype call oriented with respect to the plus strand on the human reference sequence. # We are using reference human assembly build 37 (also known as Annotation Release 104). # Note that it is possible that data downloaded at different times may be different due to ongoing # improvements in our ability to call genotypes. More information about these changes can be found at: # https://www.23andme.com/you/download/revisions/ # # More information on reference human assembly build 37 (aka Annotation Release 104): # http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606 # # rsid chromosome position genotype rs4477212 1 82154 AA rs3094315 1 752566 AA rs3131972 1 752721 GG rs12124819 1 776546 AA rs11240777 1 798959 AG ... rs28503286 X 62321 CC i5900247 X 66426 TT rs28696811 X 92130 GG rs6423165 X 169805 AA ... rs5920024 X 145234227 A rs7884816 X 145240709 T rs5920027 X 145240758 A rs9306713 X 145245430 T rs5966055 X 145255283 T ... rs2563086 Y 5057581 C rs2563090 Y 5058273 -- rs2571870 Y 5060411 -- rs2571872 Y 5060531 -- rs35572756 Y 5060608 T ... i4000691 MT 16488 C i3001927 MT 16497 A i4000690 MT 16518 G rs3937033 MT 16519 C i3001542 MT 16520 C
Ahoj, pokud tam nejake casti sekvenace pridavali nebo ubirali nebo menili poradi tak si na tom klasicky diff podle me vylame zuby. Jestli chces hledat mutace nebo chyby pri sekvenaci zkus pro kazde rsid z jednoho souboru grepnout radek z druheho a porovnat. IMHO.
On 12/22/2014 09:47 PM, Pavel Ruzicka wrote:
Ahoj,
uz delsi dobu se chci zeptat zda mate nejake zkusenosti, pripadne napady co by se dalo delat s raw daty z analyzy od 23andme. Podle filozofie ze DNA data patri tomu kdo je po tomhle svete nosi, mam jako jejich zakaznik k dispozici dump v textove podobe (priklad formatu viz nize).
Mam jeden dump z roku 2011 (human assembly build 36) a druhy z 2014 (build 37), kazdy ma temer milion radku, prekvapilo me ze se lisi temer ve vsech radcich:
ruza@azur:~/tmp/23andme/$ wc -l *txt 1927398 sorted.diff.txt 966992 genome_Pavel_Ruzicka_Full_20110125063125.txt 960629 genome_Pavel_Ruzicka_Full_20141220071038.txt
Nekde jsem kdysi videl seznam alternativnich software ktere tenhle format umi cist, nektere byly snad i open source .. nez abych zkousel vsechno co najdu na internetu (Promethease je treba placena sluzba) necham si neco doporucit, pripadne vas napadne neco co mne ne? ;)
ruza
# This data file generated by 23andMe at: Sat Dec 20 07:10:38 2014 # # Below is a text version of your data. Fields are TAB-separated # Each line corresponds to a single SNP. For each SNP, we provide its identifier # (an rsid or an internal id), its location on the reference human genome, and the # genotype call oriented with respect to the plus strand on the human reference sequence. # We are using reference human assembly build 37 (also known as Annotation Release 104). # Note that it is possible that data downloaded at different times may be different due to ongoing # improvements in our ability to call genotypes. More information about these changes can be found at: # https://www.23andme.com/you/download/revisions/ # # More information on reference human assembly build 37 (aka Annotation Release 104): # http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606 # # rsid chromosome position genotype rs4477212 1 82154 AA rs3094315 1 752566 AA rs3131972 1 752721 GG rs12124819 1 776546 AA rs11240777 1 798959 AG ... rs28503286 X 62321 CC i5900247 X 66426 TT rs28696811 X 92130 GG rs6423165 X 169805 AA ... rs5920024 X 145234227 A rs7884816 X 145240709 T rs5920027 X 145240758 A rs9306713 X 145245430 T rs5966055 X 145255283 T ... rs2563086 Y 5057581 C rs2563090 Y 5058273 -- rs2571870 Y 5060411 -- rs2571872 Y 5060531 -- rs35572756 Y 5060608 T ... i4000691 MT 16488 C i3001927 MT 16497 A i4000690 MT 16518 G rs3937033 MT 16519 C i3001542 MT 16520 C
jo, nahodnym vyberem to vypada ze zmeny jsou ve sloupci position
On 12/23/2014 01:21 AM, Radek Pilar wrote:
Ahoj, pokud tam nejake casti sekvenace pridavali nebo ubirali nebo menili poradi tak si na tom klasicky diff podle me vylame zuby. Jestli chces hledat mutace nebo chyby pri sekvenaci zkus pro kazde rsid z jednoho souboru grepnout radek z druheho a porovnat. IMHO.
On 12/22/2014 09:47 PM, Pavel Ruzicka wrote:
Ahoj,
uz delsi dobu se chci zeptat zda mate nejake zkusenosti, pripadne napady co by se dalo delat s raw daty z analyzy od 23andme. Podle filozofie ze DNA data patri tomu kdo je po tomhle svete nosi, mam jako jejich zakaznik k dispozici dump v textove podobe (priklad formatu viz nize).
Mam jeden dump z roku 2011 (human assembly build 36) a druhy z 2014 (build 37), kazdy ma temer milion radku, prekvapilo me ze se lisi temer ve vsech radcich:
ruza@azur:~/tmp/23andme/$ wc -l *txt 1927398 sorted.diff.txt 966992 genome_Pavel_Ruzicka_Full_20110125063125.txt 960629 genome_Pavel_Ruzicka_Full_20141220071038.txt
Nekde jsem kdysi videl seznam alternativnich software ktere tenhle format umi cist, nektere byly snad i open source .. nez abych zkousel vsechno co najdu na internetu (Promethease je treba placena sluzba) necham si neco doporucit, pripadne vas napadne neco co mne ne? ;)
ruza
# This data file generated by 23andMe at: Sat Dec 20 07:10:38 2014 # # Below is a text version of your data. Fields are TAB-separated # Each line corresponds to a single SNP. For each SNP, we provide its identifier # (an rsid or an internal id), its location on the reference human genome, and the # genotype call oriented with respect to the plus strand on the human reference sequence. # We are using reference human assembly build 37 (also known as Annotation Release 104). # Note that it is possible that data downloaded at different times may be different due to ongoing # improvements in our ability to call genotypes. More information about these changes can be found at: # https://www.23andme.com/you/download/revisions/ # # More information on reference human assembly build 37 (aka Annotation Release 104): # http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606 # # rsid chromosome position genotype rs4477212 1 82154 AA rs3094315 1 752566 AA rs3131972 1 752721 GG rs12124819 1 776546 AA rs11240777 1 798959 AG ... rs28503286 X 62321 CC i5900247 X 66426 TT rs28696811 X 92130 GG rs6423165 X 169805 AA ... rs5920024 X 145234227 A rs7884816 X 145240709 T rs5920027 X 145240758 A rs9306713 X 145245430 T rs5966055 X 145255283 T ... rs2563086 Y 5057581 C rs2563090 Y 5058273 -- rs2571870 Y 5060411 -- rs2571872 Y 5060531 -- rs35572756 Y 5060608 T ... i4000691 MT 16488 C i3001927 MT 16497 A i4000690 MT 16518 G rs3937033 MT 16519 C i3001542 MT 16520 C
Biolab mailing list Biolab@brmlab.cz https://brmlab.cz/cgi-bin/mailman/listinfo/biolab