Arnold Cai, Fall 2024
I first implemented a naive version of that will align the bgr channels by going through a set range of deltas and then finding then finding the best set of deltas amonst the channels to align with. More specifically, I would align the red and green channels with respect to blue. My initial delta range was [-15, 15]
, and I would use two nested for loops to go through all possible ranges of dx
and dy
. In order to determine whether or not if a certain alignment of g/r to b was the best alignment, I initially used the L2 norm squared
as a loss function, and the “best” metric was calculated by minimizing the loss. However after some testing, I found that the L2 norm squared
was not a suitable metric. Therefore, I swtiched to using Normalized Cross-Correlation (NCC)
as the comparison metric, which I found to be more promising. However, note that NCC
is not a loss function but rather a scoring metric that scores the correlation between two vectors, and therefore was a maximization problem.
NCC
implementation:Another slight optimization I implemented was to only compute np.roll()
on the input image with respect to dx
in the outside for loop and then doing np.roll()
on the resulting dx rolled image with respect to dy
. That way there will be less computations made.
for dx:
#roll dx
for dy:
#roll dy
When I initially implemented this naive search, cathedaral.jpg
and tobolsk.jpg
worked fine. However, monastery.jpg
would have a weird offset between the bgr channels, making the image look like one of those 3d image without looking at them thorugh 3d glasses. I figured that the white/black borders might have affected the NCC metric and therefore resulted in this offsetted result especially since the middle channel doesn’t have a top and bottom white border like the other channels. Therefore, I tried cropping the original images by 2.5%
on all sides before running naive search. This worked in fixing the offsets in between bgr channels in monastery.jpg
.
cathedral.jpg |
monastery.jpg |
tobolsk.jpg |
image | red dx | red dy | green dx | green dy | runtime (seconds) |
---|---|---|---|---|---|
cathedral.jpg | 3 | -4 | 2 | -5 | 0.429 |
monastery.jpg | 2 | -9 | 2 | -9 | 0.447 |
tobolsk.jpg | 3 | -6 | 2 | -3 | 0.424 |
I recursively implemented pyramid image processing in order to efficiently process the larger .tif files. Initially I tried to recursively scale down the image by a factor of 2
, until I hit the base case, where the image width is less than 100px
. Once I hit the base case, I run the naive image NCC
algorithm on it. Afterwards, I would scale the deltas derived form the base case by 2
at each recursive layer since the image was initially scaled by a factor of 2
at each layer.
In order to optimize the image, I ran the naive alignment algorithm at each recursive layer in order to fine tune results returned from the lower recursive layers. However, the delta range searched at each layer increases by 2px
as the recursion proceeds. For example, I would initially use a delta range of [-2, 2]
and by the base case the delta range would be [-20, 20]
if the pyramid image function recursively ran 9
times.
The images are all scanned differently, and therefore all have different white/black borders sizes. Hence applying a generalized crop percentage for all images doesn’t really work.
Initially, I tried using a white mask and diff the white mask with the image in order to eliminate the borders. It was pretty impressive and did get rid of the borders. However, a problem arose where the cropped images of the bgr channels were all in different dimensions and was hard to manipulate afterwards to the mismatch in dimensions.
Therefore, I ended up switching to using generalized percentage cropping. I cropped the initial image (when it still contains 3 images) by 2%
of its height and 5%
of its width on each side. This is due to the borders only being from the initial image. If I cropped each individual image after dividing the initial image into 3 pieces, I would get weird crops since the middle channel image wouldn’t have white/black borders on the top and bottom.
However, this set of crop dimension didn’t work for emir.tif
and lady.tif
. After some fine tuning, I found specific crop percentages for each image. For emir.tif
, the crop dimensions were 1.5%
of the height on both sides, 4%
of the left side and 2%
on the right side. And for lady.tif
, the crop dimensions were 3%
of the height and 5%
of the width on each side.
cathedral.jpg |
church.jpg |
emir.jpg |
harvesters.jpg |
icon.jpg |
lady.jpg |
melons.jpg |
monastery.jpg |
onion_church.jpg |
sculpture.jpg |
self_portrait.jpg |
three_generations.jpg |
tobolsk.jpg |
train.jpg |
image | red dx | red dy | green dx | green dy | runtime (seconds) |
---|---|---|---|---|---|
cathedral.jpg | 3 | -14 | 2 | -8 | 0.044 |
church.jpg | -4 | -197 | 0 | -104 | 4.722 |
emir.jpg | 41 | -89 | 23 | -47 | 5.253 |
harvesters.jpg | 13 | -131 | 16 | -68 | 5.085 |
icon.jpg | 23 | -170 | 17 | -89 | 5.555 |
lady.jpg | 10 | -265 | 8 | -135 | 5.005 |
melons.jpg | 12 | -78 | 8 | -47 | 5.115 |
monastery.jpg | 2 | -23 | 2 | -16 | 0.039 |
onion_church.jpg | 36 | -148 | 26 | -76 | 5.026 |
sculpture.jpg | -26 | -121 | -11 | -97 | 5.414 |
self_potrait.jpg | 35 | -85 | 26 | -54 | 5.309 |
three_generations.jpg | 9 | -143 | 11 | -74 | 5.322 |
tobolsk.jpg | 3 | -20 | 3 | -10 | 0.040 |
train.jpg | 32 | -169 | 5 | -86 | 4.905 |
I tried restoring some of the contrast of the images by equalizing the normalized distribution of the pixel values of each channel. I attempted to manually implement “Contrast Limited AHE” (CLAHE). Shadows and some colors became more apparent after the contrast adjustment.
cathedral.jpg |
church.jpg |
emir.jpg |
harvesters.jpg |
icon.jpg |
lady.jpg |
melons.jpg |
monastery.jpg |
onion_church.jpg |
sculpture.jpg |
self_portrait.jpg |
three_generations.jpg |
tobolsk.jpg |
train.jpg |