Jezero crater, Mars: application of the deep learning NOAH-H terrain classification system

ABSTRACT We applied a deep learning terrain classification system, the ‘Novelty or Anomaly Hunter – HiRISE’ (NOAH-H), originally developed for the ExoMars landing sites in Oxia Planum and Mawrth Vallis, to the Mars 2020 Perseverance rover landing site in Jezero crater. NOAH-H successfully classified the terrain in four HiRISE images of Jezero even though the landforms in the Jezero study area were slightly different from those in the training dataset. We mosaicked the NOAH-H classified rasters and compared them with a manually generated photogeological map, and with Perseverance rover and Ingenuity helicopter images. We find that grouped NOAH-H classes correspond well with the humanmade map and that individual classes are corroborated by the available ground-truth images. We conclude that our NOAH-H products can be refined for feeding into traversability analysis of the ExoMars Rosalind Franklin rover landing site at Oxia Planum and that they can also be used to aid the photogeological mapping process.


Introduction
The Novelty and Anomaly Hunter -HiRISE (NOAH-H: Barrett et al., 2022) is a deep learning terrain classification system that was developed to support the European Space Agency's ExoMars Rosalind Franklin rover mission to Mars (Vago et al., 2017). Briefly, Barrett et al. (2022) manually assigned one of 14 human-defined terrain 'descriptive classes' to pixels within 1,504 128 × 128 m framelets in 25 cm/pixel red band High Resolution Imaging Science Experiment (HiRISE: McEwen et al., 2007) images of Oxia Planum (Quantin-Nataf et al., 2021) and Mawrth Vallis (Poulet et al., 2020) to form the NOAH-H training dataset. After training, NOAH-H could then autonomously apply the descriptive classes to entire HiRISE images in those regions. Together with topographic data and the engineering specifications of the rover, NOAH-H terrain classifications are intended to form a key component of traversability analyses. Barrett et al. (2022) also assembled related descriptive classes into thematic 'interpretive groups' conceived to emulate how a geomorphologist would interpret the landscape.
Here, we have applied NOAH-H to the interior of Jezero crater, Mars, to create terrain classification mosaics of the landing site of NASA's Mars 2020 Perseverance rover proximal to the crater's putative delta (Figure 1: Farley et al., 2020). Applying the NOAH-H model to Jezero serves as a test of its transferability. Our preliminary analysis of Jezero revealed terrains that fall within the descriptive classes defined by Barrett et al. (2022) for Oxia Planum and Mawrth Vallis (unlike the Gale crater NASA Curiosity rover site, which contains landforms such as large dark dunes not seen in the training dataset). We wanted to assess how well the model would classify morphologies that might deviate from the examples in the training dataset because of their different geographical setting.
The Mars 2020 team has produced a geological map of Jezero crater , which we compare our NOAH-H results against. Furthermore, Perseverance and the Ingenuity helicopter are actively returning images from Jezero. This means we can ground-truth NOAH-H before the ExoMars Rosalind Franklin rover's landing in Oxia Planum.

Materials and methods
We used four 25 cm/pixel red band HiRISE images (PSP_003798_1985, ESP_042315_1985, ESP046060_1985, and ESP_048908_1985) to generate NOAH-H classified rasters of Jezero. Publicly available HiRISE images do not automatically project onto their correct positions on Mars when loaded into a Geographic Information System. To mosaic the NOAH-H rasters, we used the 25 cm/pixel Mars 2020 Terrain Relative Navigation HiRISE Orthorectified Image Mosaic ('the orthomosaic') produced for the Mars 2020 mission (Fergason et al., 2021) as the co-registration base for our HiRISE images. We then applied those co-registration transformations to the corresponding NOAH-H rasters. We illustrate some figures in this work with the 6 m/pixel Context Camera (CTX: Malin et al., 2007) mosaic (Fergason et al., 2021), but we did not use this product for analysis purposes. Figure 2 shows a NOAH-H raster where each pixel value corresponds to one of the 14 descriptive classes. For full details about how the NOAH-H rasters were produced, see Barrett et al. (2022).
We co-registered the raw HiRISE images to the orthomosaic using ArcGIS Pro 2.8 software by manually placing a trigonal grid of control points across each image. We placed additional control points around the margins of Jezero crater's delta front and impact crater rims to improve the coregistration of these topographic features. We also placed control points linking the current image being co-registered to previously co-registered images, where they overlap, to produce as seamless a mosaic as possible. We applied spline transformations to each HiRISE image so that each control point was co-registered exactly with the target mosaic. Then, for each HiRISE image, we applied the same control points and spline transformation to the corresponding NOAH-H raster. NOAH-H created the Jezero crater classified rasters in late 2019, before the orthomosaic was made publicly available (July 2020), which meant the orthomosaic was unavailable as a direct input for NOAH-H. Naturally there are offsets between our co-registered HiRISE and NOAH-H rasters and the orthomosaic in the spaces between control points. These offsets are most pronounced where there is a topographic Figure 1. A 6 m/pixel Context Camera mosaic (Fergason et al., 2021;Malin et al., 2007) of the Jezero crater delta showing the location of the Mars 2020 Perseverance rover landing site. This view is a plate carrée projection. The inset in the lower-left shows the location of Jezero crater on Mars using an orthographic projection of the Mars Orbiter Laser Altimeter (MOLA: Smith et al., 2001) digital elevation model overlain on the MOLA hillshade.
offset between control points, such as the delta front. In such places the offset can be as much as 50 m, but such high-relief regions are lower priority for characterisation by NOAH-H since they are most likely already rover no-go zones based on their topography alone. On flatter ground, pixel offsets distant from control points are typically ∼1 m.
Neighbouring co-registered NOAH-H rasters partially overlap each other. When classifying a given pixel, NOAH-H uses contextual information in the nearby pixels in the host HiRISE only. Consequently, NOAH-H has less contextual information to classify image-edge pixels. Furthermore, pixels in regions of NOAH-H overlap are often not classified consistently. To create the NOAH-H mosaics, we cropped each raster edge by ∼120 pixels (30 m), and determined a priority order to resolve pixel classification conflicts. Where pixel conflicts occurred, we wanted the final mosaic to include the most reliable classifications and more hazardous classifications for rover traversability. Barrett et al. (2022) summarise NOAH-H class identification reliability with three statistics: Precision, Recall, and Intersection over Union (IoU), a combination of the prior. Precision, Recall, and IoU are given by the following formulae: where TP (true positive) equals the number of pixels correctly classified, FP (false positive) equals the number of pixels incorrectly classified, and FN (false negative) equals the number of pixels incorrectly not classified. Briefly, classes with high Precision are correctly identified (few false positives). If a class is present and NOAH-H finds a high fraction of it, then the class has high recall (few false negatives). Classes with high IoU have few false positives and false negatives. We converted NOAH-H Precision, Recall, and IoU values for each descriptive class into unique scores from 1 to 14, with higher values yielding higher scores. We also converted the seven qualitative hazard assessments in Barrett et al. (2022) for each descriptive class into scores from 1 to 7. We added the scores for each descriptive class together to determine their mosaicking pixel priority, with higher overall scores having higher priority (Table 1). Interpretive groups have internally variable hazard levels and higher, less variable Precision, Recall, and IoU than descriptive classes, which meant we could not apply the same pixel prioritisation strategy. Instead, we prioritised the 'distributed' interpretive groups, which generally do not conflict with each other, over the 'surface' interpretive groups (Table 2).
We used the ArcGIS Pro 'Lookup' and 'Mosaic to New Raster' tools to create NOAH-H descriptive class and interpretive group mosaics of Jezero, setting the 'Mosaic Operator' parameter to 'maximum' to ensure that where pixel classification conflicts occurred the pixel with the highest value, and hence the highest pixel priority, was included in the final mosaic.
The interpretive groups mosaic provides an overview of Jezero's terrain types, whereas the descriptive class mosaic is useful for studies of the details of the area. The interpretive groups are identified with higher Precision and Recall than the descriptive classes but lack the granularity of the descriptive classes. The use of both provides a more complete analysis of the site.

Results
The descriptive class and interpretive group NOAH-H mosaics for Jezero crater are shown in Figures 3 and 4, respectively. Overall, the model classified the terrain well despite not having been trained using HiRISE images of Jezero. The variety of training examples for the descriptive classes in Arabia Terra appears to have been broad enough to produce a generic model that repeatably identifies similar textures at Jezero (Figure 5a,b). Where errors occur, these are typically because of textures with characteristics of multiple classes in the transitional regions between distinct classes, for example between rugged, textured, and fractured bedrock. Such errors were described for Arabia Terra (Barrett et al., 2022), and would probably occur wherever the model would be applied.
While transverse aeolian ridges (TARs: Balme et al., 2008) at Jezero and Arabia Terra have similar endmember morphologies, Oxia Planum TARs are typically oriented NE-SW (Favaro et al., 2021), whereas large TARs at Jezero are primarily oriented N-S ( Figure 5: Day & Dorn, 2019). Despite this, the large ripples descriptive classes (synonymous with TARs) were readily identified, meaning the model can recognise such textures irrespective of their orientation.
However, Jezero TARs also differ from those in Arabia Terra in terms of their relative abundances of the endmember morphologies. In the training dataset, rectilinear ripples are rare and occurred only in small patches isolated from other ripple types. In Jezero, fields of parallel TARs occasionally contain patches of rectilinear ripples, which NOAH-H can correctly identify, but the model can make misclassifications where TAR morphologies appear intermediate between simple and rectilinear forms. For example, Jezero TAR ridge crests can have small, oblique or perpendicular, superposing ripples, giving the ridge crest a herringbone pattern absent in the training dataset. This is different enough from a sharp ridge that the model cannot identify such TARs as simple form ripples, however, they also lack the cell-like morphology of rectilinear ripples. Therefore, NOAH-H interprets the terrain based on its irregular pattern of high-relief material, without factoring in the aeolian context for such herringbone ridges that would lead a human to conclude they are simply a different expression of large continuous ripples. The result is that the model often incorrectly classifies such ripple crests as bedrockrugged (Figure 5c). While rugged bedrock and herringbone ripple crest textures are similar on a purely descriptive level, they belong to different interpretive groups, making this a major misclassification.
We observe a less drastic error where NOAH-H misclassifies very small aeolian bedforms (Figure 5d). This texture is absent from the training dataset. A human geomorphologist can see, based on the context, that these bedforms are merely smaller than the type examples in the training dataset. However, it seems that the textural characteristics that the NOAH-H relies on to identify aeolian bedforms are larger in scale than the wavelength of these ripples. Instead, the model classifies these features as non-bedrocksmooth, lineated, a class that also consists of relatively smooth material, with lineated patterns at or near the image resolution. We consider this a mistake, but it is encouraging that the model has classified this terrain unknown to it as a non-bedrock class, with an interpretation close to the truth.

Improving NOAH-H classifications at Jezero
While the model can distinguish complex patterns on a textural basis, its occasional errors show that it lacks the contextual understanding that human geomorphologists have. To produce a more reliable classification of Jezero, examples of its terrain variants would need to be included in the training dataset. However, this would both require more labelling work and would further complicate the classification scheme. Barrett et al. (2022) showed that the model can find rectilinear ripples with very high Precision and Recall. This was surprising, since these features are extremely rare in Arabia Terra, and so had very little support in the training dataset. If examples from Jezero had been included, then higher support for this class could have been achieved. However, this would not necessarily improve the classification accuracy of rectilinear ripples. Almost all of the examples in the Arabia Terra training dataset conform to a very similar morphology, making rectilinear ripples a very distinct class. A representative suite of examples from Jezero would include a far wider variety of transitional morphologies. This would allow these morphologies to be classified but could also dilute the presently very distinct class that represents the well-defined rectilinear pattern, potentially reducing the reliability overall. Such considerations should be made when deciding how broad to make a class, and where to draw the lines between different textures.
As a further example, the misclassification of small ripples as non-bedrocksmooth, lineated could be remedied with more training, with the probable result of expanding the small ripples classes at the expense of the non-bedrock classes. The important thing to judge for such a minor misclassification is whether the model performance improvement, if any, will be worth the time and effort involved in additional training.

NOAH-H comparison with human-made photogeological map
In preparation for Perseverance's landing, the Mars 2020 Science Team produced a photogeological map of the bedrock and surficial units in Jezero crater . The mapping area was divided into 166 1.2 × 1.2 km quads, which were shared among 63 mappers. The whole mapping process, from initial training, to quad mapping, to final map unification was carried out between May 2019 and April 2020. Arguably, the most important product of this mapping process is the understanding of the sequence of geological events that occurred in the mapping area, which is communicated by the map. This understanding of the geological history of Jezero crater, which could only be realised over the course of numerous discussions, will be vital to scientific hypothesis development and strategic planning of the Mars 2020 mission . Although developing and refining the NOAH-H descriptive classes in Arabia Terra took six months (April-September 2018), no additional training was required before the model was transferred to Jezero crater. The typical classification time for a HiRISE image was ∼15 min. Mosaicking of the four Jezero NOAH-H rasters took less than one day. Therefore, NOAH-H products might be useful baseline products to create in advance of human-led geological mapping to expediate landing site characterisation. However, NOAH-H cannot replicate the most important functions of creating a geological map, such as determining the origin, formation sequence, evolution, and subsurface relationships of rock units. This is because NOAH-H merely classifies surface textures where they occur, without incorporating contextual information available to human geologists. Therefore, NOAH-H products are designed as inputs for traversability analyses and might also contribute to, but cannot replace, the conventional photogeological mapping process.
By using NOAH-H, we have produced a terrain classification map of Jezero crater, which is comparable in some ways to the Mars 2020 photogeological map. NOAH-H classifies surface textures, so the best Stack et al. (2020) product for a first-order comparison with NOAH-H is their surficial unit map, whose units are listed in Table 3. Stack et al. (2020) employed a level of surficial unit subdivision approximately equivalent to NOAH-H's interpretive groups. Barrett et al. (2022) derived the NOAH-H groups independently from the Stack et al. (2020) surficial units, so naturally they do not precisely correspond with each other. The Stack et al. (2020) map summarises HiRISE-scale information, and so they mapped the 'undifferentiated smooth' surficial unit according to its general coverage level. NOAH-H classified on a pixel-by-pixel basis, so a given interpretive group is either present in a pixel or not. To improve the correspondence between NOAH-H interpretive groups  Figure 6. Simplified surficial unit photogeological map of Jezero crater from Stack et al. (2020). We have symbolised units in this map to match similarly defined NOAH-H interpretive groups to ease comparison between these products (see Figure 4). and the Stack et al. (2020) surficial units, we combined the three 'undifferentiated smooth' surficial units into one. Barrett et al. (2022) used a threshold across-crest width of 5 m to train NOAH-H to distinguish between small and large ripples. Stack et al. (2020) large aeolian bedforms are ∼10s to several 100s of metres long and <1 to ∼10 m across and small aeolian bedforms are a few 10s of metres long and ∼3 m across. Talus and boulder fields are not equivalent, but talus deposits in Jezero crater have been shown to host boulders (Sinha et al., 2020), so in our comparison large boulder fields on slopes are probably good proxies for talus. Where Stack et al. (2020) map no surficial unit, we assume exposed bedrock units are found in such areas, so we equate the absence of surficial units with the NOAH-H bedrock interpretive group. Figure 6 shows our simplified version of the Stack et al. (2020) surficial units map with the same symbology used for the NOAH-H interpretive group mosaic shown in Figure 4. Using NOAH-H we have created a remarkably similar product to Stack et al. (2020). NOAH-H has accurately reproduced the distribution of large aeolian bedforms across the area. Small aeolian bedforms have low areal coverage in both products. Boulder fields appear to be a good proxy for talus around the fringes of the Jezero delta. NOAH-H and the Stack et al. (2020) map broadly agree about the exposure of bedrock in Jezero. Many differences between the products result from the fact that NOAH-H classifies each pixel, whereas the Stack et al. (2020) map summarises information over areas.
We have quantified our comparison between NOAH-H terrain classifications and the Stack et al. (2020) surficial unit map by creating a confusion matrix for where the two datasets overlap (Table 4). The confusion matrix shows how many pixels NOAH-H classified into each interpretive group within areas mapped as surficial units by Stack et al. (2020). If the NOAH-H interpretive groups and the Stack et al. (2020) surficial units had the same definitions and no misclassifications occurred then all the pixel counts off the table diagonal (highlighted) would be zero, and Precision, Recall, and IoU would be 100%. In practice, misclassifications occur mainly because the NOAH-H interpretive groups and Stack et al. (2020) units do not correspond exactly, in addition to the kinds of misclassifications discussed in section 3. As is apparent from a qualitative comparison between Figures 4 and  6, large ripples is the best performing interpretive group with high Precision, Recall, and IoU. The most common misclassification of large ripples is as bedrock, an example of which is shown in Figure 5c, but this accounts for only ∼9% of classifications. Non-bedrock also performed well, with bedrock being its most common misclassification accounting for ∼23% of classifications. Bedrock has high recall but low precision, meaning that NOAH-H found much of the bedrock designated by Stack et al. (2020), but misclassified many other areas, particularly non-bedrock, as bedrock. As expected, the statistics for the spatially discontinuous groups small ripples and boulder fields are poor, since these classes could be offset by co-registrations errors or because NOAH-H classifies at the pixel scale, whereas in the Stack et al. (2020) much pixel-scale detail is lumped in with surrounding units. Additionally, small ripples are more often misclassified as non-bedrock, as seen in Figure 5d. The boulder fields group performed somewhat better, despite not being equivalent in definition to talus mapped by Stack et al. (2020).
The Stack et al. (2020) map corroborates NOAH-H terrain classifications at the interpretive group level. Ground-truth observations from Perseverance and Ingenuity are needed to test NOAH-H at the descriptive class level.

NOAH-H comparison with images from Perseverance and Ingenuity
Perseverance landed in terrain classified by NOAH-H as bedrockfractured (Figure 7a). To impart a sense of scale, we have correlated the polygonal fracture pattern visible in HiRISE (Figure 7b) with a Perseverance NavCam image (Figure 7c,d). While HiRISE images can give the impression that fractured bedrock would be difficult for a rover to traverse because of apparent deep troughs between upstanding patches of bedrock, NavCam images confirm fractured bedrock is generally flat. The amount of non-bedrock material superposing fractured bedrock varies across the area of Perseverance's early traverse (Figure 7a), but Ingenuity images taken during the third flight corroborate NOAH-H's fractured bedrock classification near the landing zone (Figure 7e,f) despite the poorer exposure compared with the fractured bedrock seen by Perseverance on sol 66 (Figure 7b-d).
Ingenuity's ninth flight covered several NOAH-H terrain classifications (Figure 8). Toward the end of the ninth flight (Figure 8a), Ingenuity photographed rugged bedrock, which appears to be similar to fractured bedrock, but with higher relief (Figure 8b).
NOAH-H has identified boulder fields within the rugged bedrock, few of which are apparent in the Ingenuity image. This problem arises because boulder fields, even in the Arabia Terra training areas, are not a surface texture (such as bedrock) but are areas characterised by isolated boulders superposing a surface texture (Barrett et al., 2022). The more rugged the surface texture, the more likely NOAH-H is to classify the texture instead as a boulder field. Still, some loose-looking clasts in the foreground of Figure  8b are classified correctly as a boulder field. Figure 8c shows the start of Ingenuity's ninth flight, which, according to NOAH-H, began on textured non-bedrock material, and successively crossed rugged bedrock, the boundary between non-bedrock material to the left and a field of large ripples to the right, and an expansive large ripple field. Ingenuity images corroborate these terrain classifications. Figure  8d reveals that the finely textured non-bedrock is in fact a field of small ripples. The fact that NOAH-H partially classified this region as small ripplescontinuous, and classified the rest mostly as either lineated or textured non-bedrock, corroborates our hypothesis that at the limit of HiRISE spatial resolution small ripples lose their defining geomorphic characteristics that NOAH-H requires for that terrain classification, resulting in a more generic non-bedrock classification.
From Figure 8e we interpret that the texture of the non-bedrock material NOAH-H identified at the takeoff zone is probably caused by underlying rugged bedrock and superposing ripples too small for NOAH-H to recognise. The rugged bedrock exposed here resembles the fractured bedrock at Perseverance's landing zone (Figure 7), but apparently has a greater density of float rocks.

Conclusions
We have used NOAH-H to create terrain classification maps of Jezero crater. Perseverance and Ingenuity images corroborate NOAH-H terrain classifications, despite Jezero landforms never having been incorporated into the model's training dataset. This demonstrates NOAH-H being used as a generic model that can be transferred to other geographical settings with similar landform assemblages. NOAH-H is not intended to replace the planetary geological mapping process, as its focus was designed to be rover traversability assessment. Nevertheless, it clearly has an application as a tool to aid the mapping process, particularly when a map is needed for mission purposes and there is a strict timeframe for delivery. We are now creating NOAH-H products of the ExoMars Rosalind Franklin rover Oxia Planum landing site to feed into terrain traversability planning, in combination with topographic and rover engineering parameters.

Software
We used ESRI ArcGIS Pro 2.8 for co-registration of HiRISE and NOAH-H rasters, NOAH-H pixel prioritisation, and generation of the NOAH-H terrain classification maps of Jezero crater. NOAH-H is a deep learning semantic segmentation software developed by SciSys Ltd for the European Space Agency to aid preparation for the ExoMars Rosalind Franklin rover mission.

Open Scholarship
This article has earned the Center for Open Science badge for Open Data. The data are openly accessible at https:// doi.org/10.21954/ou.rd.17121482.v1.