mtDNA haplotype network analysis: Exploring genetic relationships and diversity in dog haplogroups

The genetic diversity and relationships of dog haplogroups were studied by analyzing the HV1 region of mitochondrial DNA. Previous studies have found six distinct haplogroups (A, B, C, D, E, and F) in dogs. Haplogroups A, B, and C were widely distributed, while haplogroups D, E, F were rare and distributed in restricted regions. In this study, HV1 sequences from global dog populations were collected, categorized into haplotypes, and used to construct haplotype networks. The results showed that haplogroup A was the most prevalent, comprising approximately 72.34% of dogs worldwide. Haplogroups A, B, and C together accounted for around 97.40% of the global dog population. Haplogroups D, E, and F were rare, constituting less than 3% of the dog population. Haplogroups E and F made up only about 1-2%. Number of haplotypes in haplogroups D, E, and F were little, sgesting that these haplotypes were introduced into the canine population more recently with limited time for significant mutations. Analysis of haplotype networks showed that haplotypes A were introduced into the dog populations in the early stage of dog domestication. Haplotype Eharbouring dogs were genetically close to wolves, suggesting a recent introduction of haplogroup E. Similarly, haplogroup F exhibited a narrow distribution primarily in Japanese dogs, with haplotype F3 identified as the founder haplotype likely introduced from a few wolves carrying the F3 haplotype.Through the analysis of the haplotype network and assessment of the betweenness value, this study has identified important haplotypes contributing significantly to the dog population. These analyses offer valuable insights into the identification of founder haplotypes involved in the formation of dog breeds worldwide, serving as a valuable reference for breed development and genetic studies.


Introduction
The HV1 region of mitochondrial DNA in dogs displays diverse nucleotide sequences with various mutations including nucleotide insertion or deletion.Each of these unique sequences is known as a haplotype.Savolainen et al. (2002) constructed a phylogenetic tree using HV1 sequences from 654 dogs, representing a wide range of dog breeds [1].The study revealed six distinct haplogroups, namely A, B, C, D, E, and F, which were also supported by Pang et al. (2009) using a larger sample size of 1543 dogs [1,2].These findings suggest that the physical characteristics of dog breeds are influenced by migration and interbreeding between breeds rather than the domestication of wolves in different regions.
The survey findings indicate that approximately 72.34% of dogs worldwide belong to haplogroup A, while around 97.40% belong to haplogroup A, B, or C [2].Haplogroups A, B, and C are widely distributed across the globe, except for haplogroup C, which is absent in the Americas [3].Conversely, haplogroups D, E, and F are considered rare, comprising less than 3% of the global dog population, with haplogroup E and F accounting for only about 1-2%.Haplotypes associated with haplogroups D, E, and F are primarily restricted to specific regions.For example, haplogroup D is found in Turkey, Spain, and Scandinavia, haplogroup E is present in Siberia, Japan, Korea, Indonesia, Thailand, and Vietnam, and haplogroup F is identified in Japan [2,4].The widely accepted hypothesis is that domestic dogs have descended from gray wolves, supported by archaeological evidence and genetic studies.However, the exact time and location of dog domestication remain subjects of debate [2,[5][6][7][8].To address this, researchers have utilized different DNA markers, including mitochondrial DNA.Shannonet al. analyzed nuclear and mitochondrial DNA from 4676 dogs, indicating Central Asia as the region of dog origin [9].Vila et al. conducted sequence analysis of the HV1 region in wolves and domestic dogs, proposing that dog domestication occurred over 100,000 years ago from wolves [10].Savolainen et al. (2002) challenged the previously mentioned hypothesis by focusing on the HV1 region of mitochondrial DNA.They analyzed a sequence of 582 base pairs (from nucleotide 15458 to nucleotide 16039) in 654 domestic dogs from major breeds worldwide.Their findings indicated that the East Asian domestic dog lineage originated from a single genetic lineage derived from wolves approximately 15,000 years ago [1].In summary, the hypothesis of dog domestication from wolves has garnered support, but there are differing perspectives on the specific time and location.2016) has made significant progress in confirming the origin of domestic dogs.Through analysis of the entire dog genome, the authors not only affirmed previous findings regarding the domestication timeframe but also constructed a migration map elucidating historical dog movements.Their research suggests that domestication took place around 33,000 years ago in Southeast Asia [7].Approximately 15,000 years ago, a group of dogs began migrating towards the Middle East, Africa, and eventually reached Europe around 10,000 years ago.In the Middle East, some dogs migrated back to the east and interbred with local Asian canines, resulting in a genetically diverse population in northern China before further migration to the new continent.However, Wang et al.'s study also highlights unanswered questions regarding dog migration within Africa and within the New World.These areas remain subjects of ongoing research, requiring further investigation to gain a comprehensive understanding of dog migration patterns.
In this study, we employed social network analysis to gain insights into the connections among dog haplotypes across the globe.We collected 512 bp sequences of the HV1 region from canine mitochondrial DNA worldwide, sourced from GenBank.These sequences were categorized into haplotypes and utilized to construct a network representing haplotype relationships.In this network, each haplotype is represented as a node, and a line connects adjacent haplotypes, illustrating the nucleotide differences between them.By examining the closeness and betweennessvalues, we aimed to pinpoint the significant haplotypes within the network.

Data collection
The reference sequence for domestic dog mitochondrial DNA, consisting of 582 base pairs (GenBank U96639.2), was utilized in this study.Using the BLAST tool, this sequence was compared against the GenBank nucleotide database to identify similar sequences.A stringent E-value parameter of 10e-94 was employed to ensure high similarity.The majority of sequences obtained from the BLAST results were mitochondrial DNA sequences resembling the standard sequence and originating from individuals of the Canis lupus.For each obtained sequence, the corresponding GenBank accession number was utilized to access and download comprehensive information from the GenBank database, including the nucleotide sequence, organism name, and sequence annotation.Only sequences belonging to Canis lupus were retained, and the relevant information was separated and used for further analysis.

Sequence alignment and mutation identification
The collected sequences were compared with the reference sequences (GenBank U96639.2) using ClustalW [11].Differences of each sequence from the reference sequence would be recorded as mutations.The set of mutations of a sequence would be considered as its mutation profile.

Haplotype naming
Sequences with haplotypes published before would be assigned appropriate names.Sequences harbouring new mutation profiles would be named using the format XnNum, where X represents the corresponding haplogroup, n indicates new, and Num denotes the sequence number, for example, An100 is the new haplotype with number 100 in the haplogroup A

Haplotype network
Due to significant differences in the nucleotide sequences of haplotypes across various haplogroups, separate haplotype networks were constructed for each of the six haplogroups.The haplotype network of each haplogroup was built using the Minimum spanning network method with the support of PopART1.7.The output of PopART was used as input data for Gephiver 0.9 software to estimate the centrality of each haplotype in the network.

Data collections
Using the BLAST tool found in GenBank's DNA database, 6238 nucleotide sequences derived from individuals of the Canis lupus had high similarity of over 90% with the HV1 sequence region of the reference sequence (GenBank U96639.2).Of these, sequences with unclear nucleotide (N) in the HV1 region were removed.the remaining 3946 sequences were grouped into 729 groups based on sequence similarities.Out of these 729 sequence groups, 298sequence groups were identified as assigned haplotypes, the remaining 431 sequences carried new mutation profile.It is worth noting that all the assigned haplotypes had been recoginized in dogs, whereas some unassigned haplotypes had been found only in wolves so far.The majority (77%) of the collected sequences belonged to haplogroup A. These findings are consistent with previous studies.The total number of sequences from the three common haplogroups A, B, and C was 97.3% of the total, whereas the three rare haplogroups D, E, and F account for only 2.7%.

Haplogroup diversity
Although most of the collected sequences were haplotypes A, all six haplogroups had similar haplotype diversity.The lowest haplotype F had a diversity of only about 0.83, whereas other haplotypes had haplotype diversity from 0.90 to 1.00.However, nucleotide diversities and nucleotide differences were quite different.Compared to the other haplogroups, the nucleotide diversity of haplogroup A was much greater, even twice that of haplogroups B and C.These analysis indicated that haplogroup A sequences were abundant, resulted a diverse populations.This could be inferred that haplogroup A was introduced into domestic dog populations very early and supportedSavolainen's hypothesis that a particular haplotype A (A29) is an ancient haplotype.
Even though haplogroups B and C exhibit the same level of haplotype diversity as haplogroup A, their nucleotide diversity is quite low.It is possible that these haplotypes were introduced to the dog population long after haplotypes A. Haplogroups E and F presented a unique scenario among all the haplogroups.The low number of sequences collected, coupled with the low haplotype and nucleotide diversity, suggested that these haplotypes had been recently introduced to the domestic dog population.This conclusion is supported by published studies, which indicated that dogs in haplogroups E and F are primarily found in specific regions such as Siberia, Japan, Korea, Indonesia, Thailand, and Vietnam.

Haplotype network
Published studies has demonstrated that the emergence of various haplogroups could be introduced to the dog population by different wolveshabouring specific haplotypes.Haplotypes in different haplogroups exhibit notable diffences in their nucleotide sequences.For the ease in visualization, in this study, each haplotype network corresponding to each haplogroup were build for further analysis.
The nodes' size were adjusted corresponding to the betweenness

Figure 1 Haplotype network of haplogroup A
The haplotype network of haplogroup A showed its complex nature, characterized by the number of haplotypes and their connections.In this network, each haplotype is represented by a node, while a link between two nodes signifies a close relationship (one nucleotide mutation).The nodes in the network are also resized based on their betweenness value.Upon examining the network, certain haplotypes, namely A3, A9, A11, A15, A18, A20, A29, A44, A73, as well as unassigned haplotypes An647 and An653, exhibited notably high betweenness values and easily recognied within the network.This observation emphasizes the importance of these haplotypes in the genetic structure of the dog population.An353 is a haplotype found only in wolf.

Figure 2 Haplotype network of haplogroup B
The haplotype network of haplogroup B exhibited a clearly structure with a fewer number of haplotypes.Notably, haplotypes B1, B2, B6, B10, and B20 stood out prominently within this network.Among them, haplotypes B1 and B2 assumed significant roles.Haplotype B1 located at the central position in the network, being only a maximum of three intermediate haplotypes away from other haplotypes within the network.This network, along with the mentioned diversity values, indicated that a limited number of haplotypes, potentially B1 and B2, were introduced into the dog population long after the introduction of haplotypes from haplogroup A. In a similar scenario to haplogroup B, the haplotypes within haplogroup C displayed a close relationship to each other.The four main haplotypes C1, C2, C3, and C17, exhibited one nucleotide different to adjacent haplotype, possessed the highest betweenness values among all the haplotypes, indicating their significance within the network.This observation suggests that at least one of these four haplotypes (as founder haplotype) was likely introduced into the dog population during a same timeframe as the haplotypes from group B.  Haplogroup D consisted of a smaller number of haplotypes.Like haplogroups B and C, the assigned haplotypes with high betweenness, namely D3, D5, D6, and D1, exhibited significance within the network.In contrast, the unassigned haplotypes had relatively minor roles, indicating their recent emergence.Information obtained from published studies provided a good explanation for this network.These haplotypes which distributed in restricted locations -Turkey, Spain, and Scandinavia -seemly introduced into the dog population long after the introduction of haplotypes from group A, hence experiencing fewer mutations.The haplotype network of haplogroup E exhibits a simple structure comprising only nine nodes.Among these nodes, haplotypes E2, En6, E1, En5, and En7 stood out with high betweennessvalues.The network revealed that E1 served as the ancestor for both E4 and E3.The difference between E1 and E4 was a single nucleotide at position 15955, while E1 and E3 differed by two nucleotides at positions 15464 (insertion) and 16003.Haplotypes E1 and E2 were distinguished by five nucleotides, with an intermediate haplotype, En6, differentiating by two nucleotides from E1 and three nucleotides from E2.Interestingly, while E1 and E2 had been observed in both wolves and domestic dogs, En6 had only been reported in wolves.This suggested that the E haplotypes were likely introduced into the dog population relatively recently, without sufficient time for mutations to form a new haplotype.It is reasonable to assume that dogs harbouring haplotype E are genetically close to wolves.A similar scenariowas observed in the haplotype network of haplogroup F, which distributed narrowly among Japanese dogs.Haplotype F3 appeared to be the founding haplotype of this haplogroup and had been recently incorporated into the canine population, likely originating from a wolf (or a small group of wolves) carrying the F3 haplotype.

Conclusion
Through the analysis of the haplotype network and assessment of the betweenness value, this study has identified important haplotypes contributing significantly to the dog population.These analyses offer valuable insights into the identification of founder haplotypes involved in the formation of dog breeds worldwide, serving as a valuable reference for breed development and genetic studies.

Figure 4
Figure 4 Haplotype network of haplogroup D

Figure 5
Figure 5 Haplotype network of haplogroup E (A) and haplogroup F (B) Shannon et al. suggested Central Asia as the origin based on a comprehensive analysis of nuclear and mitochondrial DNA.In contrast, Vila et al. proposed an earlier domestication period of over 100,000 years ago.However, Savolainen et al. contradicted these ideas, suggesting a more recent domestication event around 15,000 years ago, with the East Asian domestic dog descending from a specific wolf lineage.Pang et al. (2009) argued that the studies by Vila et al. (1997) and Savolainen et al. (2002) had limitations due to small sample sizes, making it challenging to draw accurate global conclusions[2].However, Pang et al. conducted extensive investigations using complete mitochondrial genome analysis of 169 dogs and CR region analysis of 1543 dogs from across the Old World.Their research revealed that domestication occurred approximately 16,300 years ago from a few hundred female wolves.A more recent study by Wang et al. (

Table 1
Number of haplotypes

Table 2
Diversity indice of haplogroups

Table 4
Figure 3 Haplotype network of haplogroup C

Table 6
Haplotypes with highest betweenness value in network of haplogroup D (A), haplogroup E (B), haplogroup F (C)