Ahoj,
uz delsi dobu se chci zeptat zda mate nejake zkusenosti, pripadne napady co by se dalo delat s raw daty z analyzy od 23andme. Podle filozofie ze DNA data patri tomu kdo je po tomhle svete nosi, mam jako jejich zakaznik k dispozici dump v textove podobe (priklad formatu viz nize).
Mam jeden dump z roku 2011 (human assembly build 36) a druhy z 2014 (build 37), kazdy ma temer milion radku, prekvapilo me ze se lisi temer ve vsech radcich:
ruza@azur:~/tmp/23andme/$ wc -l *txt 1927398 sorted.diff.txt 966992 genome_Pavel_Ruzicka_Full_20110125063125.txt 960629 genome_Pavel_Ruzicka_Full_20141220071038.txt
Nekde jsem kdysi videl seznam alternativnich software ktere tenhle format umi cist, nektere byly snad i open source .. nez abych zkousel vsechno co najdu na internetu (Promethease je treba placena sluzba) necham si neco doporucit, pripadne vas napadne neco co mne ne? ;)
ruza
# This data file generated by 23andMe at: Sat Dec 20 07:10:38 2014 # # Below is a text version of your data. Fields are TAB-separated # Each line corresponds to a single SNP. For each SNP, we provide its identifier # (an rsid or an internal id), its location on the reference human genome, and the # genotype call oriented with respect to the plus strand on the human reference sequence. # We are using reference human assembly build 37 (also known as Annotation Release 104). # Note that it is possible that data downloaded at different times may be different due to ongoing # improvements in our ability to call genotypes. More information about these changes can be found at: # https://www.23andme.com/you/download/revisions/ # # More information on reference human assembly build 37 (aka Annotation Release 104): # http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606 # # rsid chromosome position genotype rs4477212 1 82154 AA rs3094315 1 752566 AA rs3131972 1 752721 GG rs12124819 1 776546 AA rs11240777 1 798959 AG ... rs28503286 X 62321 CC i5900247 X 66426 TT rs28696811 X 92130 GG rs6423165 X 169805 AA ... rs5920024 X 145234227 A rs7884816 X 145240709 T rs5920027 X 145240758 A rs9306713 X 145245430 T rs5966055 X 145255283 T ... rs2563086 Y 5057581 C rs2563090 Y 5058273 -- rs2571870 Y 5060411 -- rs2571872 Y 5060531 -- rs35572756 Y 5060608 T ... i4000691 MT 16488 C i3001927 MT 16497 A i4000690 MT 16518 G rs3937033 MT 16519 C i3001542 MT 16520 C