The sidebar organization for the Cleavage map and Redundancy plot is identical and therefore it is described once for both types of visualization.
The top pull-down menu presents a list of all identified proteins matched to their respective database entries. The initial sorting of the data is based on the number of identified peptides in descending order, and can be changed to alphabetical. The “a→z” and “9→0” above the pull-down menu can switch the sorting. The Filter field allows users to search for specific proteins by name defined in the database.
This short demonstration video illustrates how these fields facilitate efficient navigation of a complex dataset. In addition, it demonstrates the detection and visualization of repetitive sequences, as well as how DigDig handles reproducibility in hexaplicate analyses of human serum.
The field labeled Conditions lists individual experimental conditions. The name is automatically derived from the filename, as detailed under section 2. Input data. The number in brackets shows the average number of peptides identified for the listed protein in all replicate analyses.
Field named Analyses shows the individual files/analyses belonging to the condition selected in the box above. The analysis name is the actual full file name. The number before the analysis name corresponds to the number of peptides identified in the particular analysis. In this field, analyses can be removed (for the current protein or all proteins in the list) or temporarily disabled. Such a feature is helpful in the context of outliers. This action hides the analysis from the current protein processing but it is restored once you select another protein for processing. The function is called via right-click within the analyses window.
Both fields, Conditions and Analyses, allow selecting one/several/all entries. Hence, pooling several conditions and/or analyses together is possible. This is done by combining left-click with Shift or Ctrl.
Ruler Reproducibility allows the user to set the reproducibility level and assess data consistency. A 0 value means all peptides in all analyses will be shown, while 100 stands for full reproducibility. In the later case, only peptides found in each replicate are used for visualizations.
The last box, Statistics, presents the digestion metrics built based on the selected data and their reproducibility filtering. The data within this box can be easily copied to the clipboard by selecting it with the mouse and pressing Ctrl+C.
Protein length is calculated from the database entry and has an informative character. Other values shown here are calculated based on the selected conditions/analyses and applied filtering (Reproducibility).
Unique peptide number shows the number of peptides passing through the filtering criteria - reproducibility level. The percentage in the brackets represents the fraction formed by the filtered peptides out of all unique identified peptides found in the above-selected conditions/analyses (this total count is also shown). This approach enables the user to assess the reproducibility of the digestion. The higher the percentage is, the better. To observe accurate statistics, do not count on fewer replications (n=2 or 3). A much higher replicate count (10-20) is more trustworthy.
Average peptide length can be used as a rough estimate of peptide length. A more detailed view can be obtained from the peptide length distribution analysis, which is done on a separate tab named Lengths.
Sequence coverage displays the part of the sequence (database entry) covered by the peptides.
Cleavage efficiency is calculated from the number of cleaved peptide bonds. Hence, a value of 100% means each bond in the protein was digested. However, this does not mean the protein was chopped to amino acids; only that the digestion result contains a mixture of numerous peptides, where each one was generated by a cleavage of unique bond.
Redundancy score displays the average number of how many times each residue was covered by a peptide. It is calculated for all amino acids in the protein or for those covered by the peptides.
Once the data are ready, their visualization is triggered via the Add to plot button. This uses the peptides matching the selected conditions, analyses, and reproducibility filter to plot the coverage map on the right side. Before creating the map, the program asks for the name of the conditions. By default, it is pre-filled with the name of the Conditions and the reproducibility level. If the user repeatedly attempts to add the same name, the automatic counter turns on (adding 01, 02, 03,…) to the end of the description.
The plot can be exported via the Export plot button and saved as a *.pdf, *.png, or *.svg. In addition, the data behind the visualization can be exported in text format through the Export data button. The data can be exported only if some map is visualized. It cannot be done just based on the filtering in the Sidebar. The exported file contains the name of the conditions and the reproducibility level in the first column. The second column “Level” shows the position of the peptide bar in the map - “0” zero stands for the first line, “1” for the second, etc. This may help in re-drawing the map in another program. Third and the fourth columns are reserved for peptide limits (in numerical values).
