DigDig identifies and reports repetitive peptide sequences within proteins, a feature often overlooked by conventional search engines. Such repetitions commonly arise in proteins with internal sequence duplications (e.g., human haptoglobin 2, prion protein) and are especially frequent when using non-specific proteases or analyzing large proteins.
For example, digestion of AHNK_HUMAN (630kDa protein) with AnPEP yields multiple recurring peptides, including 20 instances of DWHLKMP and 12 of DLHLKGP, among many others. Most search tools ignore these duplicates, leading to an underestimation of sequence coverage and a lack of warning about ambiguity in peptide localization.
DigDig addresses this gap by scanning each identified protein for repeated occurrences of each peptide and reporting all matching positions. These repetitions are currently included in coverage visualizations. Future versions will offer options to - toggle repetition analysis on/off, highlight repeated sequences in the visualizations, exclude non-unique peptides from coverage metrics, as their position cannot be unambiguously assigned. However, there is already a possibility to exclude all occurrences of the repetitive sequences in the current version of DigDig - see bottom of this page for details.
Workflow
Upon execution, DigDig scans the search results and the associated FASTA file to identify repetitive peptide sequences within individual protein entries (here demonstrated on repetitive stretch in human haptolgobin 2 alpha subunit sequence). If the program is launched via a batch file with a command window open, users can monitor the progress of sequence matching and repetition detection in real time.
The entire workflow is briefly shown in this video.
Once the analysis is complete, a summary window listing the detected repetitions will appear, unless no repetitions are found, in which case this window does not open.

Users can copy the contents of the summary window to the clipboard, however it has mostly informative purpose, as a more structured and exportable version of the results is available. After dismissing this window, the main application interface is displayed.
As mentioned above, detailed export can be done via the menu: File > Export repetitions.

The resulting csv file has the following structure
Protein Sequence Start End Count Files
Where Protein contains database entry description, Sequence shows the peptide repetition, Start and End are sequence positions for all occurrences of this particular peptide listed as individual lines. Count shows in how many analyses it was found and Files specifies the name of the analyses/search result files.
A graphical illustration is shown below, highlighting two repeated domains in Haptoglobin 2, marked in yellow and blue. These domains differ by just two amino acids (DK - in red and NE - in violet). The peptide map reveals a high degree of similarity in peptide distribution between the two regions. However, the overlap is not perfect, as many peptides extend into surrounding non-repetitive regions or include the above-mentioned amino acid substitutions.

The option to show or hide repetitive sequences was introduced during the review of the DigDig manuscript. At this stage, the feature does not yet provide the originally intended full functionality of completely excluding such peptides from both visualizations and metric calculations. Instead, it currently allows direct comparison (e.g., coverage maps) with repetitive sequences either included or excluded, while digestion metrics still fully incorporate them as in the default settings. This option can be triggered when submitting the visualization request by ticking the box “Leave out the repetitive peptides”.

It then allows direct comparison of e.g. coverage map with (red bars) and without (blue bars) repetitive sequences as shown on the example of human haptoglobin 2.

Further improvements in this area are planned for future versions, as outlined here.


