cs180-portfolio

proj1: prokudin-gorskii

Arnold Cai, Fall 2024

I first implemented a naive version of that will align the bgr channels by going through a set range of deltas and then finding then finding the best set of deltas amonst the channels to align with. More specifically, I would align the red and green channels with respect to blue. My initial delta range was [-15, 15], and I would use two nested for loops to go through all possible ranges of dx and dy. In order to determine whether or not if a certain alignment of g/r to b was the best alignment, I initially used the L2 norm squared as a loss function, and the “best” metric was calculated by minimizing the loss. However after some testing, I found that the L2 norm squared was not a suitable metric. Therefore, I swtiched to using Normalized Cross-Correlation (NCC) as the comparison metric, which I found to be more promising. However, note that NCC is not a loss function but rather a scoring metric that scores the correlation between two vectors, and therefore was a maximization problem.

NCC implementation:

  1. flatten 2d arrays to 1d arrays for faster computation
  2. noramlize the flattened arrays
  3. compute dot product of the arrays for metric

optimization

Another slight optimization I implemented was to only compute np.roll() on the input image with respect to dx in the outside for loop and then doing np.roll() on the resulting dx rolled image with respect to dy. That way there will be less computations made.

for dx:
    #roll dx
    for dy:
        #roll dy

caveats

When I initially implemented this naive search, cathedaral.jpg and tobolsk.jpg worked fine. However, monastery.jpg would have a weird offset between the bgr channels, making the image look like one of those 3d image without looking at them thorugh 3d glasses. I figured that the white/black borders might have affected the NCC metric and therefore resulted in this offsetted result especially since the middle channel doesn’t have a top and bottom white border like the other channels. Therefore, I tried cropping the original images by 2.5% on all sides before running naive search. This worked in fixing the offsets in between bgr channels in monastery.jpg.

cathedral.jpg

monastery.jpg

tobolsk.jpg

image red dx red dy green dx green dy runtime (seconds)
cathedral.jpg 3 -4 2 -5 0.429
monastery.jpg 2 -9 2 -9 0.447
tobolsk.jpg 3 -6 2 -3 0.424

task 2: pyramid image

I recursively implemented pyramid image processing in order to efficiently process the larger .tif files. Initially I tried to recursively scale down the image by a factor of 2, until I hit the base case, where the image width is less than 100px. Once I hit the base case, I run the naive image NCC algorithm on it. Afterwards, I would scale the deltas derived form the base case by 2 at each recursive layer since the image was initially scaled by a factor of 2 at each layer.

optimizations

In order to optimize the image, I ran the naive alignment algorithm at each recursive layer in order to fine tune results returned from the lower recursive layers. However, the delta range searched at each layer increases by 2px as the recursion proceeds. For example, I would initially use a delta range of [-2, 2] and by the base case the delta range would be [-20, 20] if the pyramid image function recursively ran 9 times.

caveats

The images are all scanned differently, and therefore all have different white/black borders sizes. Hence applying a generalized crop percentage for all images doesn’t really work.

Initially, I tried using a white mask and diff the white mask with the image in order to eliminate the borders. It was pretty impressive and did get rid of the borders. However, a problem arose where the cropped images of the bgr channels were all in different dimensions and was hard to manipulate afterwards to the mismatch in dimensions.

Therefore, I ended up switching to using generalized percentage cropping. I cropped the initial image (when it still contains 3 images) by 2% of its height and 5% of its width on each side. This is due to the borders only being from the initial image. If I cropped each individual image after dividing the initial image into 3 pieces, I would get weird crops since the middle channel image wouldn’t have white/black borders on the top and bottom.

However, this set of crop dimension didn’t work for emir.tif and lady.tif. After some fine tuning, I found specific crop percentages for each image. For emir.tif, the crop dimensions were 1.5% of the height on both sides, 4% of the left side and 2% on the right side. And for lady.tif, the crop dimensions were 3% of the height and 5% of the width on each side.

cathedral.jpg

church.jpg

emir.jpg

harvesters.jpg

icon.jpg

lady.jpg

melons.jpg

monastery.jpg

onion_church.jpg

sculpture.jpg

self_portrait.jpg

three_generations.jpg

tobolsk.jpg

train.jpg

image red dx red dy green dx green dy runtime (seconds)
cathedral.jpg 3 -14 2 -8 0.044
church.jpg -4 -197 0 -104 4.722
emir.jpg 41 -89 23 -47 5.253
harvesters.jpg 13 -131 16 -68 5.085
icon.jpg 23 -170 17 -89 5.555
lady.jpg 10 -265 8 -135 5.005
melons.jpg 12 -78 8 -47 5.115
monastery.jpg 2 -23 2 -16 0.039
onion_church.jpg 36 -148 26 -76 5.026
sculpture.jpg -26 -121 -11 -97 5.414
self_potrait.jpg 35 -85 26 -54 5.309
three_generations.jpg 9 -143 11 -74 5.322
tobolsk.jpg 3 -20 3 -10 0.040
train.jpg 32 -169 5 -86 4.905

bonus: auto contrast

I tried restoring some of the contrast of the images by equalizing the normalized distribution of the pixel values of each channel. I attempted to manually implement “Contrast Limited AHE” (CLAHE). Shadows and some colors became more apparent after the contrast adjustment.

cathedral.jpg

church.jpg

emir.jpg

harvesters.jpg

icon.jpg

lady.jpg

melons.jpg

monastery.jpg

onion_church.jpg

sculpture.jpg

self_portrait.jpg

three_generations.jpg

tobolsk.jpg

train.jpg

back to project list