Tuesday, August 11, 2009

Notes on the PS3 wireless keypad and Linux...

The "PS3" Wireless Keypad is actually a small Bluetooth(r) keyboard+mouse.

To get it into pairing mode with other devices, hold down the blue button and turn the power 'on' and wait for blinkinglights. Then you can pair with it normally - set a PIN and like most other bluetooth kbd's, enter the PIN number and type enter.

And no, you can't use it as a USB keyboard.

I'm not quite sure how the modifier keys work yet, and how to properly map it. More details will come later as I figure out how to actually use it. ;) I've made a little progress using the "input-events" program (ubuntu: input-utils pkg)

As a general keyboard there are definitely "holes" in the keymap. The blue(left) button works as left shift for the keys it feels like working for, and the right is right alt.

However, they're only active for some keys, and those keys get remapped.

Therefore, to make this thing *actually* work, one needs to map other keys in as CTRL and ALT. The two buttons next to the keypad select switch (which map as F24 and F23) should work nicely for this.

The Linux input drivers can't do remapping on this, but x.org's event driver can.

xmodmap -e "keycode 202 = Control_L" < that *should* have worked, but the actual modifier isn't kicking in. Argh. But I'm tired, so that's a problem for another day...

Friday, August 07, 2009

mini 6502 emulator v0.001

I checked my very early 6502 source into google code (project: chadslab) - it's not debugged at all, but it's the smallest emulator I know of.

tl;dr the 6502's logic is really elegant and I felt like trying to writing .c file out of it. :)

Wednesday, July 22, 2009

ubuntu SSD hints

Preface: Any good CompactFlash card will work as an IDE drive with a very very cheap adapter. CF operates in three modes - memory map?, PCMCIA, and true IDE mode. siliconkit has one good line, monoprice has a decent cheap one as usual, and newegg has some too. For those who never want to buy old 80-wire IDE kit again, you can also get CF->SATA adapters for ~$20-30.

TL;DR flash-based hard drives have existed for years right under your nose, and they are dirt cheap.

---

I've always liked the idea of SSD's, and now with cheap 4GB CF's (Costco blew them out for $10/ea a while back) you can fully install ubuntu, as long as you don't plan to rip DVD's, load eclipse with every plugin known to man, or things like that.

However, the typical Sandisk Ultra II card is 'only' 15MB/sec and isn't fully optimized for SSD use. One can get the faster UDMA-supporting Extreme IV card but those tend to cost more than HD's, so you need a really good reason to have one. So, unless we have the money for that* (let alone an intel X25!) we gotta tweak settings to make things 'feel' fast.

(* - especially if the target box is a hand-me-down P4)

And for those of you who have money for faster flashes - this'll work well on those too. However if your power is flaky, these... er, wait, just get a UPS, your computer will thank you by lasting a bit longer.

First, edit /etc/fstab to make sure the SSD is mounted noatime and nodiratime.

So here's some good tweaks that can be run in /etc/rc.local or better places to set /proc/sys and /sys/... tables.

# laptop mode holds onto writes for a while - up to 10 minutes with this setting.
echo 10 > /proc/sys/vm/laptop_mode
# only swap when absolutely necessary.
echo 0 > /proc/sys/vm/swappiness
# keep 'dirty' pages longer
echo 1500 > /proc/sys/vm/dirty_writeback_centisecs
echo 20 > /proc/sys/vm/dirty_ratio
echo 10 > /proc/sys/vm/dirty_background_ratio
# this scheduler will work better with flash
echo deadline > /sys/block/sda/queue/scheduler

references:

http://ubuntuforums.org/showthread.php?t=1183113
http://www.ocztechnologyforum.com/forum/showthread.php?t=54379&page=2

Sunday, July 05, 2009

More random optimization notes:

I parallelzied the DCT code differently - I eventually had a DCT function that performed 8 at once, which fits in with my batch system pretty well. This got me to about 3.9-4Gflops (measured) on the E5200 box I've been playing with... and aboot 7.2 at home.

Then I realized I did something absolutely boneheaded - I duplicated the same cosine table for each DCT process. I fixed that and now the quad gets 14.6(!) Gflops at peak... and the dual about 7.3.

TL;DR any extra memory accesses can kill you with SSE code. There's only so much bandwidth to go around - even on an i7 (which would be pretty darn cool to have for this stuff. i'll get one when i can get a nice complete rig for <$500)

Now to actually process pictures - and post some!

P.S. Made a Google Code repo at http://code.google.com/p/chadslab/ - the program won't make much sense, but... there are a few nice fragments.

And other notes while processing images:

- If you're doing anything too complicated to make gcc vectorize better, You're Doing It Wrong.

- Don't worry about tightening rarely run O(1) tasks with less than say 10,000 items, at least if you're running with current tech.

- If you're not vectorizing, double isn't that much slower than single-precision floats. But it eats bandwidth for breakfast (nom!)

- Give something enough power and a brick really will fly. On the E5200 I can do a 2D DCT+IDCT of a 1400x2100 picture in under 10 seconds. This sounds slow... until one does the math and find that it does a gazillion multiply+accumulates.

Friday, July 03, 2009

Image Processing part 2

(code going online later)

Did the code cleanup the other day... was mostly happy with the results unlike last time.

Then I started playing with DCT's just for the heck of it... and I figured out you could sharpen the image by boosting the middle/end coefficients. I still don't have the color enhancing effects added back into this version yet - once I do I'll probably start going through recent pictures and posting stuff.

I don't have a 'fast' DCT algorithm, but I do have access to an E5200 box that can sustain 1.4 GFlops. So after tweaking it takes about 15sec to do a 2D DCT+IDCT of a 1400x2100 image. When I get back home I'll have the Q8200 again - I bet that could do 2GFlops. And then there's the intel compiler to try out...

... but the real win would be transposing it to GPU code. The DCT algorithm I have now could be turned into shaders really easily... probably resulting in a 10x+ speedup w/a fast video card.

For now - the next step for image processing is to move from RGB to HSL. Having RGB*Y doesn't work very well for extreme adjustments...

Notes:

- Don't bother using SSE intrinsics - setting up the C++ code to vectorize with gcc 4.3 is far easier, even if the results aren't quite as good.

- DCT itself is quite interesting - the 'slow' frequency change covers phase changes pretty well.

Tuesday, June 30, 2009

pseudoHDR - take 0.1.

I've been experimenting with reprocessing RAW files from my camera using dcraw and then taking the RGB data and reprocessing it.

The core idea is that my dSLR has 14-bits of range (and your typical .jpg? 8-bit.) So that gives us extra dynamic range to reprocess the picture and do an HDR-type effect.

Here's my first pass at it. Yes, this code is structured horribly (if you're a future prospective employer, I can do better than this. Really!) I might or might not rewrite it properly once I get back from vacation. I've got other things I wanna throw together, too.

To use it, run dcraw with something like these settings: dcraw -h -a -4 -n 100 .CR2 and then pipe it through this program.

---

#include
#include
#include

double Kb = 0.114, Kr = 0.299;

char line1[512], line2[512], line3[512];
int i = 0, x, y, px, py;
unsigned short int *pic, *out;
double *fpic;

double *Y, *Yorig, *Y2, *Y3, *Cr, *Cg, *Cb;
double minY = 65536, maxY = 0.0;

void process(int inverse, double f1, double f2)
{
#define R 4

for (py = 0; py < y; py++) {
// fprintf(stderr, "%d\n", py);
for (px = 0; px < x; px++) {
double total = 0, peak = 0, mult, factor = 0, tfactor = 0;
int ty;
for (ty = ((py - R) > 0) ? (py - R) : 0; (ty < (y - 1)) && ((ty - py) < R); ty++) {
if (Y[(ty * x) + px] > peak) peak = Y[(ty * x) + px];
factor = 1.0 / ((abs(ty - py)) + 1);
total += (Y[(ty * x) + px] * factor);
tfactor += factor;
}
mult = (65536.0 - (total / tfactor)) / 65536.0;
mult = 1 - ((mult * mult) * f1);
if (mult > 4) mult = 4;
if (mult < 0.001) mult = 0.001;
Y2[(py * x) + px] = Y[(py * x) + px] / mult;
}
}

for (py = 0; py < y; py++) {
// fprintf(stderr, "%d\n", py);
for (px = 0; px < x; px++) {
double total = 0, peak = 0, mult, factor = 0, tfactor = 0;
int tx;
for (tx = ((px - R) > 0) ? (px - R) : 0; (tx < (x - 1)) && ((tx - px) < R); tx++) {
if (Y2[(py * x) + tx] > peak) peak = Y2[(py * x) + tx];
factor = 1.0 / ((abs(tx - px)) + 1);
total += (Y2[(py * x) + tx] * factor);
tfactor += factor;
}
mult = (65536.0 - (total / tfactor)) / 65536.0;
mult = 1 - ((mult * mult) * f2);
if (mult > 4) mult = 4;
if (mult < 0.001) mult = 0.001;
Y3[(py * x) + px] = Y2[(py * x) + px] / mult;
// fprintf(stderr, "%lf %lf %lf %lf\n", Y[(py * x) + px], Y2[(py * x)+ px], (total / tfactor), mult);
}
}
}void processdark(int inverse, double f1, double f2)
{
#if 0
for (py = 0; py < y; py++) {
// fprintf(stderr, "%d\n", py);
for (px = 0; px < x; px++) {
double total = 0, peak = 0, mult, factor = 0, tfactor = 0;
int ty;
for (ty = ((py - R) > 0) ? (py - R) : 0; (ty < (y - 1)) && ((ty - py) < R); ty++) {
if (Y[(ty * x) + px] > peak) peak = Y[(ty * x) + px];
factor = 1.0 / ((abs(ty - py)) + 1);
total += ((65536 - Y[(ty * x) + px]) * factor);
tfactor += factor;
}
mult = (65536.0 - (total / tfactor)) / 65536.0;
mult = 1 - ((mult * mult) * f1);
if (mult > 4) mult = 4;
if (mult <= 0) mult = 0;
Y2[(py * x) + px] = Y[(py * x) + px] * mult;
}
}
#else
memcpy(Y2, Y, sizeof(double) * x * y);
#endif
for (py = 0; py < y; py++) {
// fprintf(stderr, "%d\n", py);
for (px = 0; px < x; px++) {
double total = 0, peak = 0, mult, factor = 0, tfactor = 0;
int tx;
for (tx = ((px - R) > 0) ? (px - R) : 0; (tx < (x - 1)) && ((tx - px) < R); tx++) {
if (Y2[(py * x) + tx] > peak) peak = Y2[(py * x) + tx];
factor = 1.0 / ((abs(tx - px)) + 1);
total += ((Y[(py * x) + tx]) * factor);
tfactor += factor;
}
mult = (65536.0 - (total / tfactor)) / 65536.0;
mult = 1 - ((mult * mult) * f2);
if (mult > 4) mult = 4;
if (mult <= 0) mult = 0;
Y3[(py * x) + px] = Y2[(py * x) + px] * mult;
// fprintf(stderr, "%lf %lf %lf %lf\n", Y2[(py * x) + px], Y3[(py * x) + px], (total / tfactor), mult);
}
}
}

void processdark2(int inverse, double f1, double f2)
{
double total = 0, avg;

maxY = 0.0; minY = 65536.0;
for (i = 0; i < x * y; i++) {
if (Y[i] > maxY) maxY = Y[i];
if (Y[i] < minY) minY = Y[i];
if (Y[i] > 65535.0)
Y2[i] = 0.0;
else
Y2[i] = 65535.0 - Y[i];
}

maxY = 0.0; minY = 65536.0;
for (i = 0; i < x * y; i++) {
if (Y2[i] > maxY) maxY = Y2[i];
if (Y2[i] < minY) minY = Y2[i];
}

for (i = 0; i < x * y; i++) {
total += Y3[i] = 65535.0 - (Y2[i] * (65535.0 / maxY));
}

avg = total / (x * y);

for (i = 0; i < x * y; i++) {
if (Y3[i] > (avg * 4)) {
Y[i] = avg * 4;
} else {
Y[i] = Y3[i];
}
Y[i] = Y3[i];
}
}

int main(int argc, char *argv[])
{
double p1 = 0.8, p2 = 0.8;

if (argc >= 2) sscanf(argv[1], "%lf", &p1);
if (argc >= 3) sscanf(argv[2], "%lf", &p2);

memset(line1, 512, 0);

/* read the first line */
while (read(0, &line1[i], 1)) {
if (line1[i] == '\n') break;
i++;
}
line1[i + 1] = 0;

memset(line2, 512, 0);
i = 0;
/* read the second line */
while (read(0, &line2[i], 1)) {
if (line2[i] == '\n') break;
i++;
}
line2[i + 1] = 0;

sscanf(line2, "%d %d", &x, &y);

memset(line3, 512, 0);
i = 0;
/* read the third line */
while (read(0, &line3[i], 1)) {
if (line3[i] == '\n') break;
i++;
}
line3[i + 1] = 0;

pic = (unsigned short *)malloc(x * y * 6);
out = (unsigned short *)malloc(x * y * 6);
fpic = (double *)malloc(x * y * 3 * sizeof(double));
read(0, pic, (x * y * 6));
memcpy(out, pic, (x * y * 6));

Y = (double *)malloc(x * y * sizeof(double));
Yorig = (double *)malloc(x * y * sizeof(double));
Y2 = (double *)malloc(x * y * sizeof(double));
Y3 = (double *)malloc(x * y * sizeof(double));
Cr = (double *)malloc(x * y * sizeof(double));
Cg = (double *)malloc(x * y * sizeof(double));
Cb = (double *)malloc(x * y * sizeof(double));

for (i = 0; i < (x * y * 3); i++) {
fpic[i] = ntohs(pic[i]);
}

for (i = 0; i < x * y; i++) {
double r = fpic[(i * 3)], g = fpic[(i * 3) + 1], b = fpic[(i * 3) + 2];

Y[i] = (0.299 * r) + (0.587 * g) + (0.114 * b);
Yorig[i] = (0.299 * r) + (0.587 * g) + (0.114 * b);
Cr[i] = r / Y[i];
Cg[i] = g / Y[i];
Cb[i] = b / Y[i];

// Cr[i] = -(0.168736 * r) - (0.331264 * g) + (0.5 * b);
// Cb[i] = +(0.5 * r) - (0.418688 * g) - (0.081312 * b);

if (Y[i] > maxY) maxY = Y[i];
if (Y[i] < minY) minY = Y[i];
}

fprintf(stderr, "%lf %lf\n", maxY, minY);

processdark2(1, 0.8, 0.8);

process(0, p1, p1);
memcpy(Y, Y3, (x * y * sizeof(double)));
maxY = 0.0; minY = 65536.0;
for (i = 0; i < x * y; i++) {
if (Y[i] > maxY) maxY = Y[i];
if (Y[i] < minY) minY = Y[i];
}
fprintf(stderr, "%lf %lf\n", maxY, minY);
processdark(1, p2, p2);

for (i = 0; i < x * y; i++) {
double r1 = fpic[i * 3], g1 = fpic[(i * 3) + 1], b1 = fpic[(i * 3) + 2];
// fprintf(stderr, "%lf %lf %lf ", fpic[i * 3], fpic[(i * 3) + 1], fpic[(i * 3) + 2]);
fpic[(i * 3)] = r1 * (Y3[i] / Yorig[i]);
fpic[(i * 3) + 1] = g1 * (Y3[i] / Yorig[i]);
fpic[(i * 3) + 2] = b1 * (Y3[i] / Yorig[i]);

double r2 = fpic[i * 3], g2 = fpic[(i * 3) + 1], b2 = fpic[(i * 3) + 2];
double max = r2;
if (g2 > max) max = g2;
if (b2 > max) max = b2;

if (max > 65535.0) {
Y3[i] *= (65535.0 / max);
fpic[(i * 3)] = r1 * (Y3[i] / Yorig[i]);
fpic[(i * 3) + 1] = g1 * (Y3[i] / Yorig[i]);
fpic[(i * 3) + 2] = b1 * (Y3[i] / Yorig[i]);
// fprintf(stderr, "%lf %lf %lf %lf, ", Y3[i], fpic[(i * 3)], fpic[(i* 3) + 1], fpic[(i * 3) + 2]);
// fprintf(stderr, "%lf %lf %lf, %d %d %d\n", fpic[(i * 3)], fpic[(i * 3) + 1], fpic[(i * 3) + 2], pic[i * 3], pic[(i * 3) + 1], pic[(i * 3) + 2]);
}
// if (fabs(g2 - g1) > 33)
// fprintf(stderr, "%lf %lf %lf\n", r2 - r1, g2 - g1, b2 - b1);
}

maxY = 0.0; minY = 65536.0;
for (i = 0; i < x * y; i++) {
if (Y3[i] > maxY) maxY = Y3[i];
if (Y3[i] < minY) minY = Y3[i];
}
fprintf(stderr, "%lf %lf\n", maxY, minY);

for (i = 0; i < (x * y * 3); i++) {
if (fpic[i] < 0) fpic[i] = 0;
if (fpic[i] > 65535) fpic[i] = 65535;
out[i] = htons((unsigned short)fpic[i]);
// fprintf(stderr, "%d %d %lf\n", pic[i], out[i], fpic[i]);
}

write(1, line1, strlen(line1));
write(1, line2, strlen(line2));
write(1, line3, strlen(line3));
write(1, out, x * y * 6);
return 0;
}

Monday, September 15, 2008

Ubuntu Intrepid Ibex (aka 8.10) + eee (701) = win!

I loaded up 8.10 alpha 5 onto my eee last night because I'm looking into setting up my eee as a car navigation system (among other things) and I didn't want to mess around with Xandros-specific stuff. I had heard 8.10 has many eee improvements so I decided to load it up. Here're some notes...

Things figured out:

- Wireless works once you disable the 'old' Atheros driver - either in hardware manager or adding the module (ath_pci) to the /etc/modprobe.d blacklist.

- I have Sprint 3G service with a Sierra 595U. It just plugs in, Network Manager sees it, and all I have to do is connect.

- Backlight control doesn't work with the fn keys, but a simple script that does the needed setpci call will adjust it just fine, and to brighter and dimmer settings. A dim setting would probably be good while running navigation software at night... ;)

Not done:

- I need to get the Asus ACPI module loaded in. There's a package dependnacy issue at the moment, and I need a DKMS(sp) package for the module otherwise.

- I still need to try Bluetooth out, especially since I have a Pharos PT120 GPS unit on it's way. I have a nice small BT adapter which should work.

- Setting up the 8x16 font in the console applications. Maybe I can get 100x25 in Konsole.

Other gripes:

- My AData 16GB card mysteriously lost 4GB. This is annoying. If I were you I'd just keep buying Sandisk Ultra II's, or Extreme III 30MBs if you have more money than me.

- Not having the money for a 8.9" netbook. ;) (I got my 701 in November... I'd been waiting for the eee to come out for a long time. :) )